DOCOHBIt BBSOHB 



BD 151 615 

TITLE • 

IVSTITOTIOH 

FOB Dill 
HOTE 



EDBS PBICE 
DESCRIFio'BS 



OiS 67S 



IDBNTIPIEBS 



ABSTRACT 



Biexgencf Hedical Services: Be&eazch Betfaodology. 
fiesearch Proceedings Series. *' ' ^ 

latiQual Center for Health Serviceis Bese^rch and 
Developaent (DBEM/PBS), Bockville, -ad. 

De,c .71 . . 

t33p. ; Proceedings of a conference (Atlanta, 'Georgia* 
Sept€iber*8-10,. 197€) 

l!P-$0.83' HC-$7.35 Plus -Postage*. ; ^ 
AdBinistratiOD ; Jlidsinistr ativ€ Probleas; " 
Adpinistrator Bole; Conference Be ports; Bconoalc ^ 
Sesearch; Zset^ency Prograss; Field Cbeckt ♦Bedlcal 
Ci^re Evaluation; ^Medical Services; Pxrograa 
Effectiveness; ♦Prograif Evaluation; Besearch Design; 
^Besearch Methodology; Social Science Besearch 
♦Eaejrgency Hedical Service - 



The thi^^en papers included hete were presented at a 
conference on the . importance of systesatic research in evaluating the 
Eaergemcy Hedical Services (EHS) systes and adainistrative functiojiSe 
The firiEt paper;^ spells out the roles and responsibilities ISHS < ' 
adainistrators incur »hen they aake a coaaitaent to participate in a 
research project* in additional paper atteapts to explain the role 
that research d{ita can play in the resolution of operational probleas 
in service agencies* Another paper « describes the types and levels of 
evaluation research being done* Papers are also presented bn 
experia^nt^l design, attittide aeasur^fient, the developaent of 
indic^atcrs of prograa ef fectivene£(s, aeasuring the^ aonetary value of 
lifesaving ptogrdas, randoaized field tests, and putting together 
evaluation research staf^f* Finally, several papers vhich deal vith 
research conduQted in a police setting are included to offer an 
instructive analogy to the cgnflictin-g yet autually dependent roles 
of fidaiaistrator and evaluator vhich are also present in EHS 
services* (BB) ' ^ * • 



/ 



* Bfeprod actions supplied by EDBS are the best that can be aade ^ 
^ ^ . . frba the original docuaente ' ^ 






RG9€>1RCH PI^OCGGDIMG^ 
<?GRIG9 

Emergenpy 
Medical . 
Services: 
Research 

Methodology * 



Proceedings -of a conference held in 
Atlanta, Geogjia, September 8-10, 1976 

December 1977 



J. 



us DIPARTMINTOr HIALTH. 
IDUCATIDN AWtL^ARI 
NATIONAL INSTITUTI 
IDUCATION 

TMtS "DOCUMENT MAS SEEN RCPRO- 
DIICED EXACTLY AS RECEIVED PROM 
THE PERSONOR ORGANIZATION ORrOlN. 
ATlNG IT POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENTOPPlClAL NATIONAL INSTITUTE OP 
EDUCATION POSITION OR POUCV 



US DEPARTMENT OF HEALTH. EDlKl/^TtON, XnD WELFARE 
Public Health Service ^ . ' ^ [ 

National Center for Healtt> Services Research 



DHEW Publication ISIo 



(>HSJ 78-3195^ 



National Center for Health Sf rvices Research ' 
Research Proceedings Series 

I ht R'^fatJi P).>( t S*ru\ is puhiislu-d h\ the 

National ( ciUci < tor Hcait h, Ser\ ic Rt'Sfar(h 
{\(,HSR) <() tMtMid the a\ailabiia\ ol now re- 
•staiih ctnnoiuu f(l at conftTfiut's •^\niposia and 
^c^J^1a^< spnnsuit'd oi suppoiU'd b\ \( HSR In 
addition to publishing ihf papers gi\fn at ke\ 
int'< nns^s tins 'soru s inckidf^i drst u^^sions and rc»- 
spon^t■^^ w hcm-\f possd)U- Uu- st-i it-s IS mtt-nded 
to ht'ip meet the mtomicition needs of health sc4"\ - 
ues. pro\ idei s and others who recjinie diieu au ess 
to (ontepts aiui ideas e^c>l\m^; fiom the" exc hange 
lit resear( h results 



Abstract 

The f(Hits oi thi> conference, held Septtniber 
H-IO. iv^TH. in Atlanta. Georgia, was the jnipor- 
tante'ot s\stenfatic research* in e\1iluattng the 
Eniergeiic\ \fedicai SerMtVs s\steni. and adnnnis- 
traii\c functions Piesentations made at the con- 
ference and compiled in this dcKunient" c^eal with a 
range' of conceptual and methodologic issut/s. Par- 
ticufafi attention is gnen to the oppc^sing \et;Tiutu- 
alh dependent roles of the administrator 
e\aulator SeuMal papers pre se^itnig^as pectus of re- 
search conducted m a police setting offer an in- 
s^ructne analog^ to e'mergenc\ mediciil ser\ices 
svstems ^ 



3 



Foreword 



The Emergency Medical Services Systems Act of 
1973 (P.L. 93-154) established comprehensive re- - 
gional emergency medical serviceJ (EMS) systems 
in an attempt to mtegrate a number of publicand 
'private services, mcluding communications^ trans- 
portation, personnel and » facilities, into coordi- 
nated programs designed to save lives and to 
reduce disability. In the 1970s, however, we are 
moving from plirsuing health goals "at any price" 
lo a realization that our resources are limited, and 
we must make deliberate choices. The goaJ of "the 
best for everyone" provides no guidance for decid- 
ing among alternative system designs and alterna- 
tive uses of scarce resources. The EMS Systems Act * 
ifocuse^on irnproving the effectiveness of emer- 
gency services; a growirrg natipnal concern for 
containing the Tapidly-rising costs of health care 
introduces the requirement th/t system efficiency 
be considered as welf. , ^ . 

This conference,«s h^ld in Atlafnta, Georgia, Sep- 
tember 8-10, 1976, assessed the value qf research 
methods in analyzing and evaluating EMS systems. 
The conference emphasized the critical role of the 



system administrator as both a facilitator and a 
user of evaluative research. In addition to concep- 
tual ar^ methodologic presentations, a group of 
. papers presented an analog case study of the col- 
laboration between TheTolice Foundation and the 
Kansas Gity PoTice Department. Recurring 
throughout are references to the conflicting, yet 
mutually dependent, roles of administrator and 
evaluator. The police analogy offers an example of 
successful, if precarious, .resolution of those two 
roles and 5f the insider-outsider viewpoints. Police 
work IS not emergency medical services work, but 
the questions of **what difference doex it make?" 
and **what makes a difference?" are thVre for both 
public services, and there are operational and 
political considerations, technology, and e^^^aluative 
measures (e.g., reHlonse time) which are common 
to both systems. The problems and motivations of 
the policy administj-ator may offer new insights 
and approaches, for the EMS acfministrator. ; 

^ Ge>ald Rosehthal, Ph.D. 
^ ' Director 

December 1977 ^ 



III 



• 1 ■ 



# * 



ERIC 



Cbntents ' 




ill' FOREWORD \ 

tSerald Rosenthal • ^ 

National Center for Health' Services Research 

1 INTRODUCTION * . 

Lee Sechrest ^ ^ 

Florida State University - ' 

3 Administrative Functions and 'Research Requiren^ents 
Lee Sechrest ' , , < 

Florida State University 

6 Research in The Context of Delivery of a Critical Public Serv- 
ice: The Kansas City, Missouri Police Department Experience 
• Major Lester N. Harris 
l^sas Gity, Missouri Police Departmerat 

^6 Evaluation Res^jlts and Dec^ion-Making: The Need for Pro- 
gram Evaluation * 
Lee Sechrest • , ^ ^ ' 
Florida State Univer^ty 

24 Evaluation Research: What Is It and How Is It Done? 
Linda Victor Esrov 

Florida State Uriiversity , • ^ 

33 Experimental Design and Causal Inference / 
Lee Sechrest * ' * - ^ 

Florida State University . 

45 Social Attitudes and Program Evaluation. - 
Russ^eil D. Clark, III 
RIorida State University 

•5^ Re'cruitment, Sefection, Training and Supervisior^ of CiviKan 
OBservers to Work in Police Patrol Operations ]Research 
William Bieck ' » 

Kansas City! Missouri Police Department 

612 Developing Indicatbrs of Program Effectiveness: A Process 
George L Kelling ^ . • * 

Police Foun(l'atior) 

67 Measuring The Monetary Val5e of Lifesaving Progranis 
Jan PSul Acton ^ 
The Rand Corporation 

84 Economic Analysis and the Evaluation of MeAa/ Program 
Jan Paul Acton / m \ 

The Rand Corporation 




T 



Contents (continued) 



87 Appropriatenessiand Feasibility of Randomized FieJd TeSts 
Robert F. Boruch 
Northwestern University 

105 Development of Staff* for Evaluations (A RETROSPECTIVE 
• VI^W) 

George L. Kelling • ' 

The Police Foundation'' 

'Its Evgluation of Experiments in Policing: What Are We Learning? 
Joseph H Lewis - 
The Police Foundatioh 

'124 Biographical Sketches 



\ 



Introduction 



J 



It hUs become widel\ apparent that at least a 
P^rt, and often a large part, of poorly planned and 
implemented program evaluation research is* the 
inhospitable climate thafexists for such research in 
many svstepis and organizations. The climatic in- 
sufficiency Vnav, involve lack of understandmg of 
the need for ^v aluation, outright hostility fo evalu- 
^ ation. or^a lack of appreciation for the conditions 
required for a gcK^d evaluation to take place. If 
program administrators do not look favorably on 
c;\'9luation. it is virtuallv certain that even if evalu- 
ation is attempted, it will be unsuccessful. How- 
ever, even entl^sta^ii: program administrators 
ma\ obstruct, impede, and destroy evaluation at- 
tempts for want of understanding of the rather 
demanding con(|itions which must be met in order 
for e\ aquation research to succeed. Numerous 
other writers have made the same and additional 
points on the topic (e.g., Campbell, 1969, 1975a, 
1975b: Gurel. 1975: Rivlin. 1971; Weiss, 1^76, 
1972, 1973. 1975) 

A part of the problem that administri?tors 
ha\e with evaulation research undoubtedly stems 
'from their perceived \ulnerability to potentially 
unfavorable outcomes, vulnerability that is oftep 
enhanced b\ their very own prom ises> about what a 
prpgram will* pfoduce, by what Campbell (1969) 
calls the ovrradvocacy trap. However, not only may 
administrators be less vulnerable than they sup- 
'pose. with a really good understanding of the na- 
ture and^urposes of evaluation, they might come 
to see it as a potentially vs^luable tool to be used in 
the accomplishment of successful progr^s. With 
a better understanding of why and how ^ood 
evaluation research is carried out, administrators 
might also be less likely to irppede or subvert xhe 
research by decisions made in relation to iv For 
example, they iViight be more willing to plan for 
strong evaluation rn the first place, to provide the 
resources necessary to car-ry out the evaluation, to 
refrain from operational changes that would dras- 
tically affect the evaluation, etc. The view here is 
that program administrators are not necessarily 
and inherently the enemies of evaluators, with 
their informal cooperation good evaluation re- 



s(?^ch is difficult Ho achieveT^^'ithout it, good 
evaluation research is irhpossibje. 

^Another point which might be made by way tjf 
background is that research, such as it is, into the 
factors affecting the utilization of reseflirch by pol- 
icy makers points to the importance of involving 
, policy makers in the research whose results are to 
be applied (e.g,, Havelock, 1969; Salasin & Davis, 



1975). Not only does involvement of adminis- 
trators ir? oRgoina research result in a degree of 
co-opting that might make them more\^ interested 
in jhe findings, bV:t thev also may have a greater/-^ 
appreciation of the nature and potential use of the 
results by having had a hand in producing them. 

Clearly there is a need for high qualify re- 
search in emergency medical services. Yet there is ^ 
-a dearth of proposals of any qualitt at all. While » 
the reasons for lack of good EMS proposals are 
ufidoubtedly complex, perhaps in some, degree 
being inherent in* the nature of the problems, ^ 
there is np question that a good part of the prob- 
lem stems from lack of research talent in EMS sys- 
tems. There may be additional problems resulting^ 
froin a lack of strong commitment jto .doing re- 
search in the first place. Because of the impor^- 
tance of emergency services in the overall system 
of health care in this countxy^EMS would seem to 
' be* an appropriate area ia which to attempt a gen- 
eral upgrading of research ^fforts, including the 
planning and preparation of proposals. 

While there are several possible levels at which 
one might try to interverre inTMS. in order ^to im- , 
prove research, e.g., research workers already in 
the field. Regional EMS offices, etc., the confer- 
ence was , directed toward persons currentK in- 
volved in the operations of emergency medical • 
services at some level. The aim was to attract ad- 
ministrators with operationaPand decision-milPing 
responsibilities on the grounds, that the^ ^persons 
2|re in a position to facilitate good research if they 
understand the nleed'for it 4nd the requirements* 
of research that tnay^nfringe upon administrative ^ 
functions. ' * 

\ Although many of the requirements for good 
quality research may be formulated in the ^a^ijtract, 

7 .• . ' ' - ■ ' 



i.e., without reference to particular TieLds* pr con- 
tent, emergency medical services seemed to be a 
sufficiently .complex potential research area to jus- 
tify a conference focusing specifically on- it. How- 
ever, there has as yef been relatively little/esearch 
at all on emergency medical systems and even less ^ 
that may be presented as exemplary. On the other 
h^nd, there has been in the past-few years a rather - 
surprising qi^ah'tity of good qiiality research on 
police p;*acl,ices. There are many similarities ix\ i\\e 
research^ probleftis Fikely to be enc^rfuntered in 
police and emergency medical services research 
since both involve the delivery of a critical public 
service, often under considerable pressure. Both 
of them are pubjic servites^ i.e., they cannot choose 
their clientele, and both of them involve deli>>ery 
of services b\ individuals -with less. than profes- 
sional education and training, and ty pically by per- 
sons with no more than high school education. 

In view of the Ubove considerations jt seemed • 
potentially worthwhile to involve in the EMS con- 
ference a number of persons with experience in 
the pojice research field. There was no thought 
that an\ kind of simple correspondence Qould be 
made between police and EMS systems, but it w^s 
thought that the experiences in police research' 
would be relevant and instructive. Since the Kan- 
sas Cit\, Missouri, Police Department has been in- 
volved in some of the largest and mopt innovative 
police research projects, participation of individu- 
als associ^ped with those projects was solicited. In 
addition, a was believed that the experiences of 
The Police Foundation, which has funded and 
monitored much of the police research work which 
has been dooe^recenth., would be of great interest. , 

'The aim of the conference was not to make 
researchers out of administrators, but to try to 
conve) a sense of the importance of systematic re- 
search and of the nature of research, especially as 
It relates to operational and administrative func- 
tions and goals, .The topics chosen for the papers 
were meant to reflect a range of views and issues, 
hopefully \n a way quite meaningful and com- 
prehensible to EMS administrators. The papers 
Were not intended for use by the professional re- 
*search community. 

The panelists who made, presentations at the 
conference were: 

/ 

Lee Sechrest, Ph.D., Research methodologist, 
Florida Stafe University, Conference? 
Director. 

Robert Boruch, Ph.D., Research methodol- 
ogist. Northwestern University. 

Jan Acton, Ph.D., Economist, RAND 
Corporation. 

William Biech, Project Director, Response 
Time Analysis Stutfy, Kansas City, Missouri, 
Police Department. 



Russell- D. Clark, III, Ph.D., Social 
psychologist, FlArida State University. 

* ' Linda Evov, Ph.D., Research methodologist, 

Florida State University. ^ 
Lester Harris, Major, Kansas City, Missouri, 
/rPolice Department. 

• " " George Kelliag, Ph^D., Sociologist, The Police 

Foundation. * . ^ , • 

Joseph Lewis, Executive Director, The Police 

Foundation. . • 

Robert Thorner, D.Sc, National Center. for 
Health Services Research. *^ | 
^The Conference agenda' wrfs apfiroximately as 
follows: . y 

." Introductory .remarks. 

Priorities in'em^rgency medical systems 
research. 

Evaluation results and decision making: the 

need for program evaluation. , \ 

Types and levels of program evaluation. 
Problems in causal inferejice. 
Evaluation experiment sir^ulation exercise. , 
Research in the context of delivery of a c,ritical 
^ public s^vice. 

Measuring the outcomes of social programs. 
Direct and indirect outcome measures.^ 
Program assessment simiflation exercise. 
Social attitudes and program evaluation. 
C^ benefit and cost effectiveness, 
f Simulation exerci^ and discussion. \* 
Project administration and data quality 
control. 

Examples of good evahiations. 

The politics of evaluation and implementation 

,of ^findings. • 
' Putting ttogelher a good evaluation research 

team. • , 

Funding of 'research on emergency medical 

systems. 

f 



8 



Adrtiinlstrative functions * 
vai^d r^se^rch rpquirenients 

' Lee Sechrest 
, (professor of Psychology 
. Florida State University 
Tallahassee, Florida 



r 



I 



\ 



\ 



The firj4i.Mper spells out the roles and responsibilities administratqrs incur when they make a coynmitment to participate in a 
research /^rWr A similar and equally demanding paper could be written ^about the responsibilities ^(^g^earchers working in a, 
. seryice deliveHsetiing. It should ntt be thought that the problem^ and the shortcomings are all ofi one . 



In order for good Quality research to be 
planned and cartied o^ it^is essential to have full 
support [x6ft\ administrators in agencies involved 
rn the research. That statement might-seem so ob- 
•vious a^-not to need utterance, but jt is unfortu- 
natelyXhe carse that, however obvious the principfe, 
all t^ frequently th^ quality of fe^arch suffers 
dr«atically because of lack of administrative com- 
mitment and support. In part the problems may 
stem fr^m failufe of a^pinistrator^ to understand 

> research needs, in part from a failure to under- 
stand what- thef are really getting into in begin- 
ning a research project, in part from the inexora- 
ble political and pyblid* pressures that surround 
the delivery of all critical service, and in part the 
problems clearly stem from failure of researchers 
, to understand the service delivery context and the 
admijfiistrative tble. We want here to clarify as 
much as possible the way in which administrative 
functions impinge upon research. It is our expec- 
tation that in at least some degree to be foi^jyarned 
is to be forearmed, and perhaps with better Under- 

^ standing on both sides of what is involved ip^oing 
quality research^in a service delivery setting, at 
least some of the difficulties and perhaps mg&t of 
the disasters can be obviated. . ^ 

There iS, to begin with, the recognition that a 
problem exists and that systematic research might 
provide information useful in s<)lving the problem. 
It has been evident to many of us involved in ac- 
tion research settings that problems are not always 
equally well recognized by administrators and re- 
searchers, are not necessaijAtiefined ih the same 
terms by both groups, and that -a conviction of th^ 
probable value of systematic researcH is often lack- 
ing in administrators. Researchers tend for ob- 
. vious reasons to have a broader perspective on 
problems than do most administrators. Research- 
ers tend to be concerned with the general case 

^ lirhile administrators are concerned withrtheir own 
p«irticular agency. Consequently in |ome instances 
^^-"•^archer may see-and want to w^rk qn a prob- 

ERIC 



lem whi^fi simply does not exist in or is not of con- « 
cern in the setting in^J^ich the york is* to be 
carried out. Researchers^may, for example, be in- 
,terested generally* in the relationship between 
training and performarice of medical personnel 
while in a particular health delivery setting that 
.concern may be minimal, perhaps even justifiably 
Iminimal. An* administrator may see a problem as 
ynvolving limite^kresoutces with which to work 
rh\\e a researcher might prefer to define the prob- 
lem in terms of optimizing distribution of re- 
)urces available. When thefe h not a^ongruent 
recoghition and dettnitioVi of the problem io be' 
flaked on by both administrators and researcheri, 
|tijouble is, at hand. There will be a differential 
|mmitment to the research, different notions 
ml its goals 'and how to reach them, and dis- 
irtpant views of the im|>ortance of research as op- 
koked to administrative priorities ip Subsequent 
lecision making. 

Clearly, then, a first step in the planning of . 
thel research pr6cess is fo ensure that adminis- 
trj^tprs and researchers' have a common deQnition ^ 
of|the problem. If the research is directed toward 
a problem of general interest, perhaps one involv- 
ing fundamental principles rather than immediate 
anq (local concerns, it is important that the ad- 
ministrator recognize as \vell as the researcher the 
ne^ditor work on the problem and the perhaps 
^somewhat altruistic contribution that, his agency 
will! be making. Without that equality of recogni- 
tion and commitment to the idea of research as a 
sort* of societal obligation, researchers and ad- 
min^trators are bound to clash when, as is inevita- 
*ble, ihie i^essures of operational problems begin to 
lead to changes, in procedures that will weaken or 
evenlruin thp research. And administrator who. is 
"talked into" participating in a project in \vhich he 
has np interest or for which he feels no enthusiasm 
is ma ci^g d mistake in'ever beginning ^he project. 

When a research project is planned and* un- 
dertaken iji atit action setting, there are a great 



many 'implicit restrictions on the freedom of an 
administrator ^o operate in a.normkl manper. It 
highly desiwfble that these' restrictions be matfc 
explicit and that they be discussed frankly and' 
fully between the researcher anw the adminis- 
trator. It ih^y evpn be ^ good i'dea to write them 
d.qjtn and h^ve both parties initial the document to 
which they are agreeing. Unfortunately, ^ is hot 
always the case that the restrictions are recognked 
in advance by either party; ahd if they w«re recog-* 
nized, probably a good many projects would not be 
undertaken — which might |>e for the best. 

The specific j*^strictions that may be' implicit 
in research plans Will differ somewhat from one 
project or one seating to another, but there are 
some common ones that can be stated. First, if the 
project is an experiment, certainly if it h "a true 
experiment and often even if it is only a quasi- 
experiment, an administrator will be restricted by 
the plan in the terms. of which services can be de- 
hvered% The design of the experiment may call for 
some persons to receive services while they are 
withheld from others, or for different persons to 
receive different services, and the administrate 
;may not be allowed to participate ip that decision., 
^Treatments may be allocated randonily ;o people 
(or to whatever units are involved), they may be 
allocated serially, or in any one of many other pos- 
sible ways. It is a potentially severe restrictitA on. 
an administrator's authority if he or she cannot 
decide to whom or how service^ are to be 
delivered. 

A particularly troublesome probPem for mdst 
administrators arises in those cases in which the 
research design calls for withholding of some 
of treatment for some cases. Even though the very 
reason for doing the research in the first pUce 
may b^ that the effectiveness of a treatment is 
.open to serious question, it may still be difficult for ^ 
an administrator in a politically sensitive setting to 
n[)ake the decision — and then to stick by it-^f let- 
tiiig some cases go without treatment or be ex- 
posed to what is feared to be an inferior treatment. 
As is pointed out elsewhere in these papers, there 
is a powerful tendency for the effectiveness of 
treatments to become assumed before there is any 
evidence. Nonetheless, off^agcasion the adminis- 
trator wishing to pursue a certain line of research 
may have to steel himself to the risk of a "no * 
treatment" control groupu Once the commitment Is 
made, it is imfK)rtant that it be adhered to until the 
evidence is firm one way or anothef. The costs of 
mounting an experimAit at all are usually too high 
to think of having them aborted. ^ 

It should aJso be understood that there are 
similar restricijons on other persons actually deliv- 
ering the services, and one of the tasks of the ad- ^ 
ministrator may be to assist in enforcing the ex- 
perimental plan. Physicians may have to be told, 
for example, that treatment plans are to be foU 



lowed e^n when it' gbes against their o^^n per- 
sonal "inclinations or even judgment. In a study of* 
the value of diverting certain juveiMie offenders'^ 
from the criminal system it w^s found that some 
police officers were using 4heir knowledge of the 
, serial process by which juveniles were bekig" as- 
signed either todivejs^on or to custody to gain the 
type of tr^tment they tj^ought best lor pfeirticular 
kids they worked with. The police officers were 
getting, into theyecords files after hours and 
changing the order of th^ cases that would be as- 
signed the next day. In becoming involved in a r^e- 
search project, administrators assunaed at least 
some i^espbnsibility for the scientific integrity of 
the^oject. Even under the best of circumstances 
it is difficult to maintain randomization of treat- 
ments, and an administrator ean bfe of great help if 
he determijies that tl^e experimental plan will be 
carriedjout. ^ 

Administrators also very often lose fi'eedom 
^with^espefct to at least 3ome of the characteristics 
of the treatment that is being administered. In 
particular the freedom to make changes or other 
adjustments in the form of ifn^ treatment may ^e 
saitrific^ for the duration of the experimejnt. It is 
reJulily apparent to most people why in the course 
of resting a drug it is impermissible to change the 
drug in any way dury;ig the trial. It is seemingly 
more difficult to see and accept, but it is equally 
impermissible to' change other treatnlents during 
the time they are .being tested. If dtie wanted to 
test the efficiency of some type of emergency room 
organization against an altfernative, one would - 
need to decide from' the beginning what the new 
organization should be and then stick with it fairly 
closely until results became conclusive. One could 
not, without seriously, jeopardizing the interpreta- 
tion of the experiment, oontinue to organize ancT 
reorganize. Again, the point may seem obvious, 
but it becomes a sticky issue repeatedly when re- 
search is being done in action settings. In the Kan- 
sas City police patrol experiment (iCelling et al., 
1974), about which more will be said later, it was 
regarded as of utmost importance that different 
patterns of patrol be effectively maintained! in the 
experimental and control areas. However, because 
one pattern ^ng tested went so much against the 
grain of current police beliefs and practices, there 
were constant threats of subverting the treatment 
pl^M, e.g., by patrolmen entering areas on their 
own initiative. It required utmost*attention from 
both the experim^ters ^dnd police administrative 
officials to maintain the conditions of the experi- 
ment reasonably well. The ifntegrity of the expert- 
mental treatment is also at least.a partial^responsi- 
bility of the administrator. 

Operational procedures not directly part of 
the experiment may^also need to be kept constant 
during^ th^ course of the sty dy.- Record keeping^ 
systems, for example, should not be changed 



inidstream. In^ the Hauaii Experimental Medical 
Care Review Organization (JQ73). as an especialK* 
inform4ii\e example. a.s\stefji was established to 
dp fleer review of treatment of target diseases in 
hospitats. Data were a\ailable for a biselirfe period 
and I herr for the period follow in^g the beginning©^ 
peer review. Unfortunately, at tiie trme peer re- 
Mew siaYted therjf was also a critical change in the 
wording of on<* requirement, making it more likeK 
that It would have been met and therefore that- 
peer review would appear eifectiAe. Adminis- 
trators making the decisio^ tj[)'participate m a re- 
searc»h project mav also find themselves being 
called upon lo'change their record keeping or data 
collecting procedures and then find that thev are 
in some degree responsible for data qualitv con- 
trol The requirements of thtiresearch mav Deces- 
sriaie the keeping of records|that \\ould not ordi- 
narilv t>e kept, and the maintenance of data qualflv 
control mav jn\olve extra monitoring of Carious? 
persons and prtxesses. These, matters should be 
well underst(K)d and worked out before the re- 
search begins Geographical boundaries foj service - 
districts mav have to be kept the same even thougl^ 
siricilv operational Considerations woulW d^ate a^ 
change Even changes in personnel mav ttave to be 
•avoided if an experiment is going to produce con- 
vincing results It is worth rememberijig that an 
unpersuasive scient4ric investigation is a v\aste. and 
ih'e appearance, a§ well as the actualitv . of objectiv- 
itv.and fn(egriiv is rrnportant. ^ 

Finallv. of course, administrators mav experi- 
ence a subjective se-hse of loss of budgetarv control 
vMihin their organizational units The;budget allo- 
cated for research mav sometimes seem quite large 
in relation ^o the operational budget, and the ad- . 
mmistrator can hnd her or himself jn a situation in 
v\hjch a lot of monev is b^ng spent bv a lot of 
people in wavs thaf are threatening That threat 
will be esfeciallv likelv if some ()f the ad-minis- 
iraior's own staff become part of tht research 
project or that the basis for t-heir professional 
lo-valnes\eems dfstinctiv shifted. There are a lot ©f 
researcn projects in which ^n administrator is 
likelv to come to wond^ just what is in it for Jrim 
or her ^ 

Again v\e can offer ho panaceas, In'thel^est of 
situations the administrator and the researchers 
v\JI have a sense of colleagueship. of being em- 
barked together on an important 4nd ultimatelv re- 
warding venture That sense of joint resportsibilitv* 
^nci coop)erati\eness is best fostered bv an open re- 
lationship from* the beginning, one in v\hich each 
participant^ has- a good understanding of the 
other's problems and intentions and in v\hich eath 

/has a firm commitment to the same goals. 
^ One factor' limiting the participation of an 
administrator in a research project mav well be the 
doubt of i^^e administrator tl\^t anvthing of value 
i^likeU to be gained b\ the Research. For one 

ERLC , V 



/ 



thing, \erv fevN adminiitratorj^iof -anv kind are 
trained in research*, so jhat thev do nt)t understand 
it and have liiiJe appreciation of hov\ 4i is done ai^. 
v\h^^nt mav have to offer. There is, m fact, a tvpe 
of administrative stvie widelv taught and admired 
in which an achinnistrator engages in a pencxi of 
"fact finding." depending largelv upon subordi- 
nates fcfr input.^ and then entef^ his inner-office 
for a^pefiod of mulling things over before an- 
.nouncing a personal, and correct, decision. Pref- 
erably the period of 'mulling ov^r sfiould be brief 
so as to maintain a reputation for decisiv eness" Re: 
search vxhich has been done to cfeie on th<? utiliza- - 
tion of ^lentific and other information in decision 
making processes indicates that far too manv ad- 

' ministrators and-managers are ^interested in re- 
search findings c^nlv if thev confirm vxhat is al- 
*readv believed. Administrators also tend to have 
limited trust in ^nv research that v\as not done 
vMthin-their ov\n organizations Thai mistrust/not . 
onlv *nai n)W9. drastuallv the information sources. 
vxhich are searched, but ^ mav also make manv 
administrators doubt the \alue of doing research 
of a more basic <\r ^eneriT nature that does not 
bear directlv owa problem of^i mediate interest ' 
The' foregoing vxarmngS and stringent pre- 
scriptions for v\orking out everv thing in advance 
should not be taken indicative of the near im- 
possibilitv of doing g6od researcfi a| all in an ac- 
tion setting Rather thev are mehnt to be realistic 
assessments whiA, if taken into account, can make 
the diff^rtncle betvxeen a godd research project 
and "a failure. Not all of the problems referred .to 
are likelv to be relevant in anv one setting, and 
some of the others can probablv be rather easilA 
resolved. Nev erth^eless. the problems. of dotng 
good qualitv research should not be underesti- 
mated. As will be evident "from examples pre- 

1^ sented throughout, these papers the problems are^ 
formidable and not Ukelv to Se solved salisfactorilv 

, bv anv team v\ho approach the research task vxith 
the idea that it is going to be easv. It almost never 
IS. But it'is not so difficult as to be effectivelv' 
impossible. . • 



TOferences 

Kelling. G.L.. Pate. T. Hieckman: D. &: Brown. 
C.E The Kansas City PrrJenttve 'Patrol Experiment' A 
Technical Report Washington. DC: The Police 
Foundation. I 974. 

The Hawaii EMCRO: Aui^experment in Son-punitive 
peer Review. Project Report. Grant No. 3 R18 HS- 
00795 SRC. National Center for Health Servites 
Researcti. Rock\iIle, Maryland. 1973. 



' 5 



Research Ict the 

context of delivery 

of fi critical ., 

pgblic service: . ' 

thd* kan^? cfty, 

IMIssoOri ;^ollce^epartmf nt 

ekperleh^e- * 



KaBS^k*.Ci*y,j'.Mi«A«uri Poll 



/■.. 



ice Department 



Major Lester Hams has sjfent all>his (^er^as a policeman in the Kansa^ City, Missouri Police Department. He hcul overall, 
responsibility for the Kansas City police patrol experiment and has 'been heavily involved in the other research projects carried out 
and underway in that Department^^ajor Harps was asked to discuss the ways in which research came to4>eja regular activity of his 
Department and the problems that are involved in ^rrying xmt research while afjhe same t^me having responsibility for public 
services^cf a highly 'v.isible and even critical nature. It is believed^ thai lhere a¥e enough^similaHties between the politics, structurli, 
and missions of a ni(ijor police department and an emergency medical system or rescue service to make the lessons from the police 
department instructive.' < ' . ^ 0 



Prior- to beginning a 1 description' aqd discus- 
sion. of research and planning within an organiza- 
tion, at least a desQfiption of that organiza- 
tion IS needed inSorder to provide sojne context.^^ 
for the information!^ ^ 

» * 

— ^ -The Kansas City, Missouri Police Department, 
unlike the vast majority of municipal police de- 
pattments^ is not under the administrative control 
of the <?ity government. The department is under 
state controj, operating and administered under 
tthe provisions of Missouri Statutes, sections 84.350 
through 84.890 UYider provisions of these stat- 
utes, the governor ^of the State of Missouri, with. 
Consent of the senate, appoints four citizens of 
Kansas City as a Board of Police Commissioners 
TheJWayor of th^ City, by virtue of his office, is 
the fifth member oT fPTe Board The power an^ re- • 
spon^ibility of police Wrvice is vested in this Board 
of Pplice Comnjissionersi The Chief tof Police is 
appointed by the Board of Police Comrtiissioners 
and serves at the pleasure of fheJBoard. The stat- 
utes define the powers ^nd responsibilities of the , 
Board and Jthe Chief, set forth rafik structure a1fld 
salary ranges, addresses personnel administration 
matters, defines arrqst j^wers, sets forth budget- 
ing ^nd fiscal provisions, '«tc. 

Within the pol*^ department, the topmost or 
, largest organizational entities are termed • 
''bureaus." Presently ther^ are foOr bureaus; Op- 
erations Bureau, Administration Bureau, Investi- 
^ation$ Bureau, and S^rvHes Bureau. The bureau 
commanders report to the Chief of Police and, 
with the exception of a few functions, whose heads 
• report directly to the CMef of Policy (e g., Media 
;* Liaison and Legal Advisor), all orgafnizational ele- 
l ments are a pari of and 'subordinate to 6ne of the ' 
( bureaus. The next' level of organisational elements 
\ below bureaus are called ^'divisions'' and they are 



in turn divided into "c^nits^" Tfie ' organizational 
structure is not considered to be sacred or perma- 
nently fixed; it is^ortly a framework within which 
resources are oj'^anized in order to facilitate coor- 
dinated efforts tojvard departmental objectives. 
^>^Cllfe*ations are made In the organizational struc- 
ture as needed. » f 

^ JDepartmenr personnel ^Strength i}5 presprtly 
1,212 law enforcement personnel, 479 ful^^hie 
rfgtilar civilian persortliel, 85 part time' school 
crossing guards (during the scKd<^l year), 48 tem- 
*porary contract tivilian personnel, and 102 reserve 
police officers. • 

JCansastCity, Missouri-is a city X)f 3 16.83 square 
mil^s with a 1970 census population of 507,409. It 
is the principal municipality of a metropolitan area 
with a 1970 census population of 1.4 million- 
There a/e part^of three -counties within -the city^ 
limits, and the western city limit ,is comprised of 
the Missouri-Kansas ^ate line. In 1975 there were 
46,530 Part I criminal offenses reported to the 
poHce department. 

The department first established ^ Planning 
and Research Unit in about 1953. The unit at that 
time, and for the' following decade, wa^ staffed 
with only two* or three officers. This unit main.- 
laitjed a small departmental libfary and compi^d 
necessary inforrfiatioTi and statistics for a depart-' 
ment annual report, which is required by state 
statutes. The resources a^d efforts of the unit 
xvere otherwise involved essentially in routine staff 
studies and the development and. writing #f pro- 
cedures as directed by the Chief of Police or neces- 
sitated by current demands px\ the depc^rtmeht. 
Fo> example, tiuring the 'period <%f 1957, tlirough 
1963, the city of Kansas Cit^, Miss9uri/annexed a 
total of 235 square 'miles in annual increments. 
The* Planning and Research Unit performed much 



•of the staff work necessary to expand {ijBce serv- 
, ice to these annexed areas each, yyar,* such as de- 
termining patrol'beat boundari^ etc. The unit 
did very little worlfrwhich could properly ixg 
terme<f as experimental or. innovative in nature. 
(*)ur deparriWnt WasVertaiqly not ^lone in this re- 
gard, however.. The prevailing^attitude'in policing 
*^'was,that **if,it has, worked for the past twenty years 
there's no reason to change it.'i 

One notable exception to this rel^ively non- 
* progressive stance was the introduction in 4953^of 
patrol cars manned by only one officer instead of 
t-he traditional two officers. This very significant 
departure from tradition was implemented in the 
Kansas City, Missouri Police Department by Chief 
^ Bernard firannon, and conty:iues to this day, to be 
a controversial issue in many other dep^rin^ents 
This innovation, ^lo.ng with his strong advocacy^ 
an increased educational level for police ofWers, 
earned Chief'Brannon a national reputation in law 
enforcement. • ' . - 

When Clarence M Kellev became Chief' of 
' Police in 1961 we were a relatively modern pdice 
department, b\ traditional standards F6r the most 
parL the officers vvere vvell trained, by contempor- 
ar\"^standards, and dedicated to good police per- 
formance. Internally, there were problems. 'Xhe 
previous Chief of Police and several high' ranking 
officers had recently been indicted by a grand jury^ 
on matters of a malfeasance natuf*e!^^?he major 
issue of the Chief s'^indictment concenrdW ^ inaccu- 
rate ojpie reporting and statistics, Tne^e were a 
number of<liqMes withirt'tfie organization and this 
exerted, a strotiger influence on promotions and 
assignments than did objectively assessed merit 
and qualificatipn. ^ 

At the time Clarence Kelley became Chief in' 
•1961 he retired froip the Federal Bureau of Inves- 
tigation aft^r more than twenty yeafrs of service. 
Though Kansas City was his hometown.^ he had 
not resided there for many years and was not at all 
familiar >vith the police department. There were 

^ maiTy^members of the department, especially 
a^fcng the lop ranks, who resented an outsider 

^ being appointeti as phtef of Police,. as opposed to 
the position behig filled from withiq th^ organiza- 
tion. Due to this fact and to the politics of the 
internafcliques. Chief Kelley immediately experi- 
t enced difficulty in eliciting the ciVidor and de- 
pendable information necessary fop him to become 
^quafnted with the department and its problems. 
He>* therefore, reorganized the structure of the 
department, creating eight separate divisions 
whose commanders reported directly to him. 
While this is an unconventional structure and a 
- very ^ide span of x.ontrol for a police adminis- 
trator, it served its intended purpose. It enabled 
^Chief Kelley to break ug the power cliques, to be- 
come familiar with th« various ojperations in a first- 
hand riianner, and to assi**s/the strengths and 

ERIC , • 




?^wealjnesses of comnJfcnd and supervtspry persoA- 
hel. After he had accomplished these fhings he re- # 
turned to*a vftmrc conventional org^izati^nal 
structure, ^ / • * ^ , 

. Chief Kelley spent the -first several , years 
changing the climate within the departnijpnt^ ^He . 
' stressed the iaiportartce, in fact necessity, bf ir\,^cg- 
rity in both the individual and org^nkzationa4 
sense. recognized tfiat no one person' tran ad- 
minister such a complex function and organization 
, alone, and he stressed the necessity ahd benefits of 
involv^ement of his personnel in the management 
and progress of the department. He believed and 
explained that (here are twd types of mistakes; 
mistakes of judgement and mistakes of the heart.* 
Was given that honest mistakes in 
^maoe in pie process of trying to do a 
3uld not negatively affect one's stand- 
ire in- thiV department. He changed 
prqrhptionai procedures so. mat promotions were 
based on competition and merit, ^nd promotion by . 
virtue of internal po'litTcj ^or favoritisih Was no * 
longer possible. While the department was a good 
-one by traditional standards when' Chief Kelley 
tooL office, he was. convinced that he 'had beqn 
g^iverra mandate' to make it a better one, the best ^ 
one possible. While he undoubtedly quickly recog- ' 
nized some of the changes which'were needed, he > . 
reahzed that constructive chf^ge cannot^ be forced 
and be successful, hence his efforts to change the 
■ climate of the organization to one of integrity,, op- 
erational ethics, and involvement. The type of re- 
search, planning, and/progress noted in the follow- 
ing, pages could hot. have occurred had this climate 
not been created. * / * 

^ Chief Kelley also strongly believed in the utili- 
zation of technology in law enforcement and was 
responsible for the implementation of helicopters 
as another dimension of patrol. Also, through his 
efforts a computer was acquired, with the top 
priority applicatton being that of service and as- 
^ sistance to the^police officer on the street Today 
the Kansas City, Missouri Police EXepiirtment com- 
puter systeqi^serves over, fifty criminal justice 
agencies in- Western Missouri and Eastern Kansas 
in addition t-^rbuf own,^d has been termed b)|^ 
many as "the be|t police computer system in^. 
existence. 

. 'V^hiW the d^artment is under state' control, 
a^ described previously, the operating budget must 
be appropriated by the city governmem. Lijje most 
other organizations,* our financial needs have .in- 
creased each year for many years. These^increased 
needs -have been due to a, combination'o«%:onomic ' 
inflation, increasing demand^/of quan4ity of 
polrce service, 'and the costs ^|^Ealed with pro- 
grams to improve the quality cSHblic^ service. In 
the face of these escalatijig budget, requests, the 
City Council in 1965 insisted* they be givtn an in- 
dependent^ view of the cjjj^tment's operation. 

13 . V . ' 



ERIC 



The Board of Police Com^nissioners and chief Kel- 
ley readily agreed to thisjmd^s a result a contract 
was negoUated with the J*uh!lic Administration 
Service of. Chicago, Illinois/for a study of the de- 
partment. The study was very comprehensive in 
scope, including adipinistrative, oianageitient, and 
operational facets. . 

r The .personirel 4^mplemeht. of the depart- 
. .ment\Planniug-and Research Unit was increased 
for the.purpose of working with Public Adminis- 
tration Service on t,he study. Their funhioi^ were 
to ass/St' in acquiring and compiling requested in- 
formation, provide liaison with the various organi- 
, zational elements of the department,'«.etc. The 
Planning and Research Unit had minimal involve- 
ment in determining the thrust of the study or in 
formulating the recommendations that would be 
forthcoming. 

. ' * . It was. decided at the outset of the study that 
PAS would submit recommendations foc-Sriiange as 
they were formu*lated, and that if the change rec- 
ommended seemed reasernable and held potential 
for improvement the department would proceed 

• .with implementation immediately, as opposed tcr 

deferring any and all changes umil completion of 
the study One reason for^iiis was to get the con- 
sultants involved in MUfplementation while still on 
, sit'e. The Planning and Research Unit also assisted 
inp the implementation phase, mainly in a suppor- 
tive or facilitative role, such as writing procedure 

• manuals, etc. 

Overall, the study resulted in a nunpber ot^ 
rtiE^mjTTendations and changes throughout the 
department. Some of th^se changes have survived 
to the present, either in original or subsequehtly 
revised form, 'and others were totally unsu<;cessful 
and have long since been discarded. 

One.of t^he'more sign^icant and controversial 
changes concerned the organization of the patrol 
function. Prior to the study, command of the pa- 
trol function was vested in the commander of each 
patrol station area or patrol district, who was re- 
sponsible to the comrjiander of the Patrol Bureau. 
The station or district commander had twenty- 
four hour responsibility for his geographical* area. 
Each h^d a subordinate field commander respon- 
sible to him for each of the three ei^ht hour 
^'-watches. This was changed to a watch-zone con- 
cept as recommended by PAS. There were three 
1ta,tcH commanders, 6ne on each eight hour shift, 
responsible to thev Patrol Bureau Commander. 
Each watch coVnmander was responsible for the 
patrol function for the entire aty, but only during 
his assigned eight hour watch. The city was di- » 
vided into three geographical zones, each having a 
zone commander res'ponsible ,to his respective 
watch commander. Under this organizational 
structure, there was no qne below the Patrol 
Bureau Commancler who had twenty-fonr hour 
responsibility for the patrol function in any^iven 



area of the cityi'The command structure was built 
on ap eight hour segment of the clock as opposed 
to a gcojgraphical area. Great difficulties were ex- 
perienced yq^h internal communications, transmit- 
tal of orders, citizen satisfaction, and personnel 
morale. Operation under ihis structure was almost 
totally unsatisfactory and in 197l,»four and one- 
' half years^ later, tlXe depart?hent reverted to the 
previous command and organizational structure 
within the patrol function. 

There were some worthwhile improvements 
'and progress made as the result of changes made 

* in response to recommendations made by PAS. 
Implementation of the changes and realization of 
the progress did not cope easily, hovyever. 
Hindsight makes it clear that theitnain obstacles 
encountered vyere due to the fact/ that personnel 

, di;*ectly affe<:ted by the changes had very little 
inpi^x is to* what those changes should be. There 
was resentment that '-outside experts" could come 
into the department and tell us how we should do 
things. When people^involved jn an operation have 
the opportunity to be significantly involved in the 
identification of their own problems and develop- 
ment of their solutions they have a vested interest 
and intense commitment to successful implementa- 
tion of those solutions. The total realization of this 
fact is perhaps the most valuable result of the PAS 
study, for tHe Kansas J^ity, Missouri Pdlice De-^ 
partment, and you will see that it was certainly 
kept in mind as we structured subsequent research 
and planning programs. 

By the time the PAS study was over, the per- 
sonnel complement pf the department's Planning 
^ and Research ©nit was approximately twenty. 
While it-TVas originally intended that most of those 
transferred to the uRit were there ©n a^ temporary 
assignment to work with PAS, the size of the unit 
wa^ never decreased. Even thpi|gh one of the PAS 
recomnip*ftations was for the continued existence 
and utiuzatipn of such a staff unit, the main reason 
the unit was not diminished in size or importance 
was that Chief K^ley strongly believed in its value 
g.tb the continued progress of the department. He 
^vms convinced ahat intelligent decisions required 
that the' problem or issue be accurately identified 
and described, that all pertinent information be 
accumulated, and that alternatives be- identified 
and evaluated. Certainly he did not. have the time 
to, appjy this process personally to all issues con- 
fronting him, so many of them were assigned to 
the Planning! and ResQgrch Unit wi^th a request for 
study and fecommendation? 

All members of the unit were sworn law en- 
forcement personnel, with the exception of clerical 
personnel. The officers assigned to the unit were 
selected on criteria which emphasized paSt jobper- 
" formance, intelligence, commitment to pi^ofes- 
*sional excellence, and interest in the assignment. 
The assigned officers had practically m> formal 

14 



7 



training or experi<^rice in fornnal research or plan- 
ning an^ they learned and improved through ex- 
perience Motivation was very high due to the chal- 
lenge and to the firm knov^ledge that the Chief of 
Police sincerely attachfd great importance to the 
worth of the unit's product. Unit personnel pro- 
posed to Chief Kelley that he meet with tKem 
periodically 'for informal discussion of matters of 
current interest and -concern. They felt such dis- 
. cussicins would be^verv b<pneficial'by permitting 
them to become exposed to* his philosophy and 
goals on policing and department administration. 
He agreed to this, but at the very first such meet- 
ing made a. statement to the following effect: "I 
can see where )our feeling that an understanding 
of m\ personal philosophy and goals will 'be of as- 
sistance in your work, but I want to make one 
point \er\ cle^r. I don't want you to ever give me 
staff work or recdjnWendations which is merely an 
attempt to give rpre what you think I want to hear. 
If you do that your contribution to* this depart- 
ment v\iU be of minimal value. I want you to ap- 
proach all issues objectively and give me the bene- 
fit of your best thinking and your recommenda- 
tions. It IS mv responsibility to accept or r^'^ct 
vour recommenclatjons, and in &o doing, I am to- 
tally responsible for the results, good or bad." 

Within the department, the Planning and' Re- 
search Unrt was frequently referred to as "the 
ivory tower bunch," "the empty holster crowd," 
^nd similar terms, sometirties seriously and some* 
times in jest. Conscious efforts were made by unit 
personnel ^o consult with those assigned to func- 
tions |lotentially:/> affected by the project being 
worked on and therefore, departm*ent personnel 
wjere more receptive to. change resulting froni such 
internal staff, work than had it been developed by 
outsiders. This does not mean that only informa- 
tion lind opinions from within the department 
were gathered or considered. Depending o^i the 
nature of the project, input was also sought from 
other police departments, criminal justice agen- 
cbes, business, industry, etc., i e,, any source 
deemed appropriate and pertinent. 

The Planning an^ Research Unit w^s kept 
busy with j5taff studiei and development concern- 
ing matters of ^urrAjt and pressing urgency; 
There was a desire anck recognized need by both 
d^epartment management and the staff of the unit 
to become inyoAved in some research and planning 
'of a more l6r^ range nature, but it s^med that 
the lime was jujJt.not available. Prompted by these, 
circumstances, plus the recognition of the poten- 
tial beneifts of mvolvii>g a greater number of de- 
partmental personnel in plaftning, personnel of 
the Planning and Research Unit in October, 1969, 
(Proposed to Chief Kelley the formation of several 
task forces. 

V It was proposed that each task force include 
^•Presentation from command, supervisory, and 



patrolmen levels and that they be charged to re- 
search and submit^ecommendations for future di- 
rection relative to some rather broad and general 
subject areas. This generdl*oncept wa6 discussed 
with Chief Kelley an^d he reacted wibh whole- 
hearted support. As a first step he requested thsft 
each corjimanding'officer in the department .sub- 

. mit a paper to him discussing their assessment oi 
the Strengths and weaknesses of the department 
and their ideas for futi^ire changes and direction 
for the department in pursuit of increase profes- 
sional excellence. Not only was this a first step in 
the intended task force organization, but the re- 
sponses were of great value to Chief Kelley in 
helping'him further assess .the individual 
strengths, weaknesses, and potentials of his com- 
manding officers. Following Chief Ke^lley's review 
of these papers they were given to Planning and 

' Research Unit for review,, summarization, and 
identification of the subjects receiving significant 
attention. In/^lay, 1970^ eight task forces were 
formed and each was charged to address them- 
' selves to one of the following subject areas: (1) re- 
gioijalization of certain police functions; (2) possi- 
ble additional sources of revenue for the operating 
budget; (3) educational standarcls for police; (4) 
supervisory training and development; (5) Fiuman 
relations, both within the departrnent and with the 
cdmmunity; (6) improvement of irlvestigative pro- 

^ cedures-; (7) improved patrol concepts and proce- 
dures* and (8) Improved inservice training 
programs. Each task force was composed of two 
commanding officers, two sergeants, and two pa- 
trolmen or detectives. The commanding officers 
were appointed by Chief Kelley ^nd they then 
selected and recruited the other four members of 
their respective tasi^ forces Since the department 
was very undermanned, it was necessary -to require 
that all task force members continue their primary 
duties full time and address their task £orce as- 
signrhents as time permitted. They were told that 
they Were fre^ to seek information and assistance 
from any source willing to provrde it, but there was 
no money available to hire consultants or staff. 
' From this^point on, they were on their own except 
fgr wnat assistanc:e the Planning and Research 
Unit could provide' relative to possible sources of 

information and staff stUdy methodology. 
. g 

The reports received from these task forces 
ranged all the way from very brief, elementary and 
superficial, to very comprehensive with much ef- 
' fori and good thinking quite obvious. Some of the 
reports received no further action or attention 
once they were read due to their lack of substance? 

rand/or a lack of the means and resources to pursue 
the subject 'at the time. Some resulted in varying 
degrees of changes and new programs wjithin the 
departmcnjL in the following two years. Those 
which prompted change or hew programs con- 
cerned supervisory and executive training and 



RIC 



-10- 




ERIC 



development, human relatfcons, and in-service 
training. ^ 

Overall, the quality of efforts exerted and re- 
ports submitted were quite commendable wlien 
one considers the circumstances under vyhich the 
task force members were asked t6 produce^ They > 
did not possess formal knowledge or skills in re- 
search,^ problenv identification, or program de- 
velopment ^md they were not provided funds to 
avail themselves of assistance in these areas. They - 
were expected to continue performing their nor- 
mally assigned duties and, do their, research and 
produce their, report- as an extra assignment. Most 
of the task forces were composed of members 
from various units of assignment and, in some in- 
stances, who worlt^ddiffefent duty hours. While it 
was originalfy felt that such diverse re{iresentation 
within a task forpe would be a positive factgr, 
hindsight indicates that it was not» It is difficult for 
a persohjto'^rasp,. get highly motivator toward, 
and pursue issueslforeign to his experience and av 
signed dutie§. It also made it very difficult to 
schedule task farce meetings. Another aspect 
which presej^ted problems v#as that most of the as- 
sighed subject Area s'*we re too general and broad ' 
and there was much floundering in attempts to 
identify specific and definable issues|*o pursue. 

Probably the mosislgnifica'nt behefits derived 
from this task force program was the experience 
anS effects on tl^ose who were members of the task 
forces, and not specific cha^nges resulting from the 
reports. It emphasized the sincerity of Chief Kel- 
ley's philosophy of participatory management and 
desire- for the thinking of all meftibers of the de- 
partment; it stimulated conceptual thinking; and it 
expai^ded the participant's awareness an^ under- 
standing of problems and issues confronting law 
enforcemcmfWyond those of the individual's spe- 
cific normal ciut/^assignment. 

The next significant phase of the department's 
research and [Manning experience resulted from 
the combination of two events, the creation and 
mission otihe Police Foundation and approval by 
the voters in KaAsas City of an increase in'the city'^ 
earnii^gstax from .5% to 1%. ' ' - 

The Police Foundation was created in J970 - 
with a five year, 30 milliorv dollar grant from the 
Ford Foundation and a mandate to "assist police 
agencies; in^ realizing their full potential by de- 
veloping and funding promising; prpgranrs of in- 
novation." Representatives of the Foundation vis- , 
ked a number of major' police jclepartment| to 
become more familiaf with current policing 
methods and^problems and to try to assess the 
capacity of the departments for the development 
and implementation of innovative programs. The 
Kansas City, Missouri Police Department received 
such a vi'sit by three representatives of the Founda- 
tion in eariy 1971. In the summer of I9f\, the; ^ 
Foundation sponsored a two week conference at ^ 



the University of Wisconsin, attended by members 
of the departments which fiad been visited: New 
York, Njfw. York; Baltimore, Maryland; Cincinnati, 
Ohio; I?etroit, Michigan; Dallas, Texas; and Kan- 
sas City, Missouri. This conference involved dis- 
cussion of policing problems, programs, and p>G- 
telitials and was attended by^hief Kelley and six 
commanding officers from Kansas "City* The 
Foundation, had indicated that following the visits ^ 
tonhe departments and the conference they would 
select several of the departments and award them 
major grants. Shortly after the conierence it was « 
announced that the Gfhcinnati, Ohio and Qallas, 
Texas Pplice Departments would receive grants. 
Since it was very unclear what the potential grants 
would be for or what relationship th'e^oandation 
expected to establish with the departments, the 
Kansas City, Missouri Police Department xiid not 
pursue the award of one of these grants./It is not 
clear what consideration on' the part of the Foun- 
dation resulted in Kansas City not being offered 
one of the grants.* 

In December, 1970, the voters of Kansas City, 
Missouri approved an increSise in the city's earn- 
ings tax from to 1%. The city govehiment had 
made a commitment t6 the voters that the gfh^^ 
majority of the resulting revenue would be spent^ 
for public safety, including the addition of 350 of- 
ficeVs to the police detpartment. The department 
actively and vigorously campaigned for passage of 
the earnings tax increase, promising that 280*of 
the 350 additional officers would be assigned to 
patrol and specifying how many were to be as- 
signed to each patrol division so that voters would' 
know what to expect in the way ofrincreased visibfce 
police protection in their particular areas of the 
city. 

Chief Kelley recognized that the addition of 
these officers p^vided a rare opportunity to reas- 
sess existing patrol strategies arid firocedure^ and 
to develop pjans for the deployment and iitiliza- 
tion of the additional officers in the most benefi- 
cial manner possible. In fact^ he felt we wfere ethi- 
cally obligated to*do so. In late August, 1971, 
Chief Kelley and members of the command itaff 
agai/i me t wit h representatives of the Police Foun- 
dation. Foundation was informed of the de- 
partment's Intent to study patrol strategies and 
problems and. to* pursue improvement and they 
were invited to consider joining with us and assist- 
ing us i^f^ihese efforts. We made it very clear that 
any projects were to be aJdepartment venture not a 
Foundation venture; tnat we would insist on re- 
taining control and Responsibility for what was ' 
done. Within this coniext we took thfe position that 
we woul^ appreciate/the assistance the Foundation 
could provide and ^oillcl make all possible efforts, 
to work widh them. 

After/lengthl); discussions the Fouiidation 
agreecf^o join wi|h*\is. While the depart#ient fully 



16 



intended, to embark on these efforts concerning 
patrol, with or without 4he assistance of the Foun- 
dation, we had practically^ no resources for ^n- 
sultant assistance or other expenses, therefore, the 
asskstzfnce of the Foundation was of great value 
and facilitated much more comprehensive efforts 
and projects The Foundation provided funds for 
such assistance as. consultants, travel to other de- 
partments to studv various programs, rental 
and/or remodeling costs for office space, overtime 
pay, clericaJ staff, and evaluation. 

In October, 1971, four tasTc forces were 
formed, one within each of the three patrol divi- 
sions and one in the Special Operations Divi-sion. 
This division is composed of patrol support func- 
tions, 1 e.! Tactic^il Unit, Helicopter Unit, Canine 
Unit, and Police Reserve Umt. Each task force was 
composed of six to eight niembers: the division 
commander (rank of Major); one captain; arid the 
remaining, members with the rank of sergeant and 
patrolman* All three watches, or shifts, were rep- 
resented in each of the^patrol division task forces. 
Fhe division comjpander was chairman of the task 
force bi^ f ach member bad equal input and vote 
withoiu regard to rank To provide process assist- 
ance and support in problem identification, re- 
search, and program development, one officer of 
the Planning and Research Unit and one Police 
Foundation consultant was assigned to work with 
each task force ^ 

£ach task force was given a mandate to iden- 
tif\ the most critical problems confrontmg its re^ 
spective division and to develop and submit rec- 
ommendation Jor addressing them. Chief Kellev 
assured his totaUsupport and assured the task 
forces that their recommendations were to be 
submitled to him, that he would thoroughly studv 
and consider them, and that he woiild'make the 
final determination as to their implementation. He ^ 
stressed the absolute necessity for integrity in- all 
that they might do 

The task force approach was chosen for three 
main J^easons. (1) involvement^of people affected 
most by a prAfam in t-he development of that 
program greatnTincreases the cormnitment to im-- 
plementation and enhances the sucf^ss of the pro- 
gram; (2) It was believed that the persons working 
in the divisions coulfl best and mosi^ccurately 
identify and assVss the contemporary problems 
facing their respective division; and (3) a firm be- 
lief in the individual and collecnve capacity of the 
patrol officers While the fask force approach is 
not usually the most expeditious and efficient pro- 
cedurally, it was believed that the yalue of (1) and 
(2) above made this approach much preferable to 
any other alternative In organizing and setting up 
the task forces we tried to apply lessons lear-ned as' 
the result of mistajtes made with the iask fcfrces 
created in 1970 and previously described herein. 

It was intended that the task /orces be, to the 



* Extent possible, representative divisions, and they 
were urged to develop and maintain the best 

' communications possible in order to receive input 

*from all personnel and to keep them informed of 

'what was going on. This was not an e^sy thing to 
do, especially during the early stages when the task 
forces were involved in general discussions and at- 
tempting to define their direction. The variou^ 

> rhethods used in attempts to estabtish and main- 
tain communications included inviting division 
personnel to attend task force meetings, memo- 
randuras, having task force members Attend reg- 
ular roll calls periodically, and having a task force 
member rid^ patrol with the officers. ^ , 

Task fcwce ^tivity began initially 'With 
periodic meetings^ usually weekly, and members 
otherwise contijiuing to perform their normally 

. assigned duties, A number of trips to other^citi'es 
were taken by task force members to study other 
patrol operations and programs. When this oc- 
curred, the"^menibef(s) making the trip were re- 
lieved froni their normal assignment^ ^nd wtre 
considered to be on temporary special diAy status.^ 
Later in the process ^ome members of the task, 
forces were relieved of normal duty^ ^nd assigned 
full time task force duty to pursue program 
development. 

Shortiy after the fask forces were formed a 
Task Force Coordinating Council was created- The 
council was chaired by ^e. commander of the Pa- 
trol Bureau and included th^ commander of each' 
of the divisions having task forces and- the comr 
mander of the Planning and Research Unit. The 
purposes of this council were to provide coordina- 
tion between the task forces, exchange informa- 
tion of common interest, avoid un;iecessary tlupli- 
cation .of research ancT other effprts,* "keep the 
Patrol Bureau commander infornfec^ of task force 
activity in all of his. divisions, address policy issues 

^ raised bv task force activities, and review nask force- 
program proposals* As' previously noted, Chi^ 
Kelley retained the 'r'esponsibiiiiy" fwr final ap^ 
proval pridisapproval of task force 'propopls so 
the council coul^^ only .ayach thjbir recommendfa- 
tions for the Chiefs consideration. 

At the ftiception of tbe^ask forces the coJisult- 
ant assistance provided to each task force by the 
Police Foundation consistpd of individuals with 
primary" employment and responsibilities 
eUewhere in the country. These persons would fly 
in to ^Msas City for task force meetings, usually 
for offlpay per week. It soon became .evic^^ent^that 
this not a satisfacjtory arrangement anti ^ipuld 
become even less satisfactory as the task forces got 
closer to program development and intplementa- 
tidn. The task forces felt that the arrangement did 
not permit the degree of involvement and com- 
mitment on the par^of the consultant which they 
felt was necessary and that the limrited atcess iq 
him was not adequate for their needs. 



As a* result of the dissatisfaction with the "fly- V 
in fly-out" consultant arrangement, the Operations ^ 
Resource Unit was created, as an orgaaizational 
elemertrof the Patrol Bureau. Persons with needed 
skills were hired by the department , on a contract 
basis^with funds pfovided by the Police Founda-' 
tion. The unit w^s headed by a regular departnient 
member. By <Ms time it had also been recognized 
that, in addition to whatever might result -from the 
task force pro^arfis, one of the potential benefits 
.of the relationship wiih and support of •the Police 
• Foundation was the acquisition or devejopmen|> pf 
research and program ^development sltills within 
our department, which would'xemain .with us and 
be oi value beyond the current task force program' 
and affiliation with the Foundation, Acdordingly 
three patrolmen with a hi^h interest and potential 
for this type of work were selected and transferred 
to the Operations Resource Unit. This unit did not. 
have iht role or authority for making "Significant 
decision?;; their primary pfrpose was providing 
process support to the task forces. In addition to' 
the activity of direct and active process support, 
the unit provided computer programming capac- 
it\„ compiled a library of programs of interest on a 
national scope, catalogued information emerging ^ 
fr6|^ task force activities, .and provided access to 
consultants available nationally when needed. 

AH of the task forces successfully completed 
the process of identifying problems, prioritizing 
thom, and selecting speafic problems for which 
they developed and implemented new -programs 
pr experimental research. Several o^ these pro- 
grams, after trial and evaluation within tKe divi- 
sior; of origin have been Mmplemjented and in- 
stitutionalized throughout- all patrol divisions* 
Purpose and space of this paper do not permit^ a 
discussion of each of the projects. One* tlie South 
Patr6l, Division project, will be briefly described 
because it was experimental research in nature, 
was erf great significance to th/^ field of policing,'' 
and demonstrated that a police organization can 
design and conduct meaningful research. 

In response to instructions, to all of the task 
forced to identify the most' critical problems con- ^ 
fronting their respective divisiorjs,^ the South Pa- 
trol Division Task Forcf identified fiv^ problem 
arias: (1) residence burglarieS; (2) juvenile offen- 
ders; f3> citizen fear of crime; (4) public education 
about the* polite role; and (5) police-community 
relations. 

*'Like the other taijc forces, /he South Task 
Force was confronted next with developing 
workable remedial strategies. And here the 
task force met with what at first seemed an in- 
surmountable barrier. It was evident tl^t con-, 
^ centration by tfie' South Patrol Divisio/on the 
five pnoblem areas would cut deeply into the 
time spenilby its officers on preventive ffetrol. 
At this point, a significant thing happened. 



Someof the members of the South.Task t'orce 
questiohed whetherjroutine preventive patrol 
was effective, what police officers did while oh 
preventive pati'ol duty^ and what effect police 
visibility had oV the con^munit/s feelings of 
security. * 

Out of these discussions c^me the proposal to 
conduct an e;cperiment which woul^ test the 
|rue impact of routine preventive patrol. ... . 

As would be expected, considerable con^ 
troversy surrounded the experiment, with the 
central question being whether long-ran^e 
benefits out-weighed.'short-term risks. The 
principal 'short-term risk was seen as the pos- 
sibility that crime would increase drastically in 
the reactive beats; some officers felt the exper- 
iment wuld be tampering with citizen's lives 
and property. 

The police officers expressing such* reserva- 
tions were iv> different fFom tfteir 'counter-- 
parts in otnier departments. They tendecj to 
view patrol as qne.of the most important func-, 
tions of policing, and in terms of tirne allo- 
cated, they felt that preventive patrol ranked 
on a par with investigating crimes andrendef-* 
ing a^sjstance.in emergenci^. While some 
admitted that preventive patrol was probably 
less effective in preventing crime ^nd more 
productive in enhancing citizen feelirtgs of se- 
curity , 'others insisted that the activities in-* 
volved in preventive patrol (car, pedestrian 
and building checks) M^ere instrumental in the 
capture of criminals and, through the police 
visibility associated with such activities, in the 
deterrence of crime. While there were am- 
^ biguities in these attitudes toward patrol and' 
its effectiveness all agreed it was a primary 
function." * 

Out of these discussions came a. task force 
proposal to conduct an experiment to assess the 
value of the traditional routine preventive patrol. 
Chief Kelley, displaytng a great degree of adminis- 
trative courage when one considers the strong 
tradition being questioned and the unkqowti out- 
come, g*"ai|^ his approval to proceed with the ' 
experimen^n doing so he imposed two con-, 
straints: (1) the department's responsibility to 
serve and protect the'ptiblic must not be ne- 
glected; and (2) the department's normally low re- / 
sponse time to caUs for service must *n©t Be im- 
paired^ U was agreed that crime statistics would be 
monitdred closely on a weekly basis and rfiat any J 
signifi&it increasre in the experimental area would 
result in prompt termination of the experiment. 

The experimerll was conducted in a 32 sqjiare 
mile area of the South Patrol Division having a * 


* George Kelling ct al . Thi Kansas Ctty Preventwi Patrol Expenmtnt 
A Summary Report J,W»thmgton, D C , 1974).ip 7-8 



1970' census populatiori of 148,395. The 15*^atrol 
beats this area were computer msrtched ok the 
^basis of crime data, AumKer of calls for police serv- ' * 
ice, ethnic coiji position, median.income and trans- 
iency of population into five groups of three. 
Within each group of three beats one beat wasiKles- . 
igriated as reactife, one as proactive, and one as 
control. In'^eactive beats all routine preventive pk- • 
trol was withdrawn. The ^i^gned patrol unit -r^- * 
spopded to 'and handled ^KaHs for service but 
v^hen not so dispatched and Wcupied'remairved on 
the beat perimeter or patrolled^ in ^n adjacent • 
proactive .beat. In, the ptoactive beats the level *of 
routine preventive patrol wa*s incre^^^ from^ two, , 
to* thr^e times^«normal through the assignnie^ of * 
additional p^trolj units and p^trqlling, of reactive 
units. The l^vel of .patrol in the cogtrol beats re- 
mained normal, wtrh orie'.unit assigned tq each 
yhtai patrjolli/ig in nor^jah manner. * * 

' ' ' \ 

^ The, experirnenif ^was iniriairy stared oit July ^ 
. 19, 1972, but was^s'pen^^ed in m«id^-AiJ|just w4>e>\/. * ' 
It V^-Vecogn^Vcl, that ejfperinienja^l c6nfi'\i\pjn$ ^ 
' were no)t be^ng ajieqiJately ^niaintairicd^afnd Tthat ' 
^ somc'.proBlems^'were«evid^nt. Necessary revisibns. ^ 
jvere mad^ in in^tructTons ^fid. guidelines a ltd the .\ 
experiment was i»esurV>ed ori OMober'l, 1072; and ■ 
^ r^fachcd a succ^ssfnL'CcmclMsidTbn Segtembef* 30, ^ r 
1973. Data,.was c^^cteS- by rrte4n§'0f ien different , : 
sur.veys\«hd qi/c^ionnalres, inteivi^ws,*ob5eYvers 
ridipg with officeVs, atld fro*n departmental dat^ /• 
(crime/iraffic. arrest. /drspatch: ofBcer activity.' * 
and pei;;56rinpftrecords>. \^ '^'^'^ *; • '! • / *"* 



The.pi^kc. was aware that an expierimehtVas* 
being coTidtiQted but w;a^ -nbt informed ^f th&epi^cf 
riature^f -^plice ^patrol ^re'^ence irf the .Various 
* beats .rtor specific locatidrm 6Fthebeat*s. In on^ in- 



cident a busine>6mari,v\v'kr informed by ah'oppo- 
nent jhe expefifn"tht;(hat jhrs busi^iess w'^s lo-' 
tated irf aa .areavfr'crm^wl^tcbr djl pojic^ palrX)J .had 



been w itJ]draw;n and -a pro;,est was^ express^ dL- 
Chief Kelle^* me/t w^th business representatives of 
the area and -ej^plained th^ nature iarid plffpbs^'pf 
'the exjperiment and thajt it was bti'n^ tlQsely. rnoni-. 
tored At the conclusipn of'-his^xpl^nz^ticin he re-/ 
ceived a stapding pvatibT? from those present.' 

* . The restilts of th^ experiment dlsc^ased tl=fat 
the varying levels of routine, preventive- ^Da^rol had 
no effeft ort actual crime, reported <:rirne, com- • 
*'fnunity attitudes 'tqward police'on dejjv^ry of 
police servioe^vre^pvonse time/6r tj'affic accidents.. 
Of 648 indi\iidual statistical cpm^arisqris made to- 
produce the major findings, statisticjil^Slgpificarfce 
occurred on Iv 40 times. » 

Iri Jiily^ l/ft73, Chief Kelley bedme Director of 
the Federal .Bxireaa of Investigation aifd in 
November, 197,3, Joseph D. McNaYnara became 
chief of the Kansas City, Missouri JPolice -Depart- 
ment. Chief McNlamara quickly expressed hrs sup- 
port of the.clepartment*s research orientation and 



efforts to ihiprove our 'reputation as one^ of the 
' best police departments in ^he nation. 

^ At the present lime, the department is in- 
volved iji three very significant projects. 

1. Directed Patrol — This project was im- 
plemented in the East Patrol Division on 
' July I, 1976, and i& a nat,ural J6llow-up \b 
the Sduth Patrol Division Pi^^ritive Patrol 
' ' » experfment described above. Given the re- 
sults of the Sofith experiment , we feU obli- 
gated to develop more productive methods 
of utilizing the un^mmitted time of patrol 
officers. One prpblem in dofng 'so is. tTie 
fragmentation at such tifne. Th^ Directed 
Patrol program; 1(§;,vekrped by an East Pa- 
trol Division Task Force, has tw^) major 

• 'components. The first seeks to assdss prior- 
ity of calls for service, with some responses 

. ' being delayed, some citizen?^ being re- 
quested to come, to the station to make, re- 
pprts, and spme Yeports being taken by 
phone. This is ^n effort id realize uncom- 
mitted limp of patrbl officer* in larger and 
more predictable time'incfements sor thaj it 
can be utilized in planned .and directed pa- 
fol activity! The seconcT component in- 
olves the utilization of that time i« various 
prograri^s directed toward crime preven- 
V ""yoix, fii^^rid^l-^Upi^ort tor the development 
. ' of the ^rojtct was provided^by the Police 
Foundation ^and funding for impleryenta- 
. tion-ahd evaluation is from an LEAA ^ant. 
2^Responrse Time Analysis. Study — Police re- 
/V . spdnse" time has loftg been asstfmftd to be a 
^ ; very critical factor in police patrol effective- 
.ness,- .especially with regard to apprehension 
criminal offenders. A number of studies 
' have previo^usly betn conducted, but none 
. df sufficient ^cope and quality to pro(^e or 
4is^f>rove traditional assumptions. This 
study is a very comprehensive and sopTiisti- 
cated project started on, October 1,1 973. 
T^e continuum from crime or other police 
• ^ . incident occunrence to corlia^t between the 
responding. officer and the citizen is*.being 
measured in minute internals for 'the pur- 
pose of assessing, t^y^ffects of^^riable re- 
sponse times on arrests, witne^ availability, 
victim injury, and citizen satisfaction. A sec- 
ondary objective is the ajialysis of problems 
* an3 patterns of citizens reporting crime. 
This ^tudy is funded by the National Insti- 
tute of'Law Enfcrcemtnt. ^ 
3. Domestic Violence — One of the many tradir 
f tional assumptions in la^v enforcemejit is 
that the police. are powerless to "have any 
preventive effect on homicid^ and aggra- 
vated assaults because most of thfm occur 
between relatives or acquaintances, many ^ 

• are spontaneous, and/or most occur inside 



19 



14 



ERIC 



•-buildings or other locatfous not Visible to 
police patrol. In 1972 a sergeant assigned to 
• our Planning and Research Unit gathered 
and analyzecl a large ^;nount of flata from 
police rep6rts V>f homicides and aggrav^ated 
assaults, arrest records* and dispatch rec*', 
ords. He concentrated 6n those of domes- " ' 
tic nature, which agc^nt for asmajpr<^or- 
tion of the homipi<fes ^nd assaults? He^ 
found thai in the twq years precedmg the » 
offenses in a domestic setting the police had » 
contact witJi either the victjm or suspect, or 
bo^h, in responding to and handling distur- 
Ji^nce^ calls. In 85% of these cases the^police 
^ had at feast one such previous contact and 
'in 50% of the ca^s we had five or more 
suc^ contacts. Is,there something the police 
can do in these contacts to forestall a future 
homicide or aggravated assault? The East 
Patrol Division recorded data"-e»-?Tumerous 
variables observed in the process of han- ' 
dling disturbance calls.' ^here is a very 
strong indicat;on that various interacting 
variables can provide some ability to predict 
potential foi- 'future violence 'between the 
' participants of a'^domestic disturbance, Jf 
this IS true, it is felt that the police can refer ^ 
such people to aVi appropriate social service 
agenCy for assistance, thereby reducing the 
incidence of domestic homicides and as- 
saults. In July of this year, the National In- * 
stitute of Mental health awarded a grant to 
the department for further analysis pf the 
data 'collected and the collection of addi- 
tional data. ^ ' ^ 
/ ■ 

It mijght s^em to the reader of this paper that 
what has transpired in the Kansas City; Missouri 
Police Department insofar as research, experimen- 
tation arfd plannirvg resulted from a grandiose 
master plan or schedule developed y^ars ago. Such 
is certainly not the tase. To a large extent^ our ef- 
forts and progress hav^ been reaction to 'contem- 
porary events and opportunities. One thing ,tHat 
was deliberate, and Tm sure planned, was the crea- 
tion by Chief Kelley of a climate within the de- 
partmer\t which encouraged involvement "and in- 
novation.^ Sincere and strong top management 
support for such is absolutely essential to m^in- 
ingful and successful efforts such 3S have been dis- 
cussed. Along W|ith this strong support, manage- 
ment must assume a frfgMitative role as opposed to 
a strqng directive role; an overly directive role 
•stifles initiative and participation of personnel 
within the organization. All of our patrol task 
forces were jnitiated at the sam^ time in 1971. One 
oP the§e task •forces struggled much harder and 
took miicb lon'ger than the others to "get off the 
ground" an'd start making some rtrieaningful prog- 
ress. T|here is general agreement among those vfkxo 
monitored the process th^t this was due to the /act 



that the genmman^er of that division was quite au- 
thoritarian *in his ^r^onality and' management 
styie. ' . • \ . 

Tlie departtneot.has recognized and realized 
many -benefits' and advantages of the task force 
approslch. ^ome of thd" very significant ones arc: 
* ♦ 1. It'pfroVides 'an environment for pefs^nnel 
development and*enha1ices capacity xo 
prcif)erly handle discretipn. • ' * , 

2. It provides^ ^n qpportunit^ to identify 
Highly ct)m{>etent personnel at all levels of 
the organization. 

3. It increase^ communication, coordination, 
and morale within the organization. Prior to. 
the patrol task forqes there were ffequent 

' requests for .transfers to other paj'ts of the 
^ -brganizatidn. As •t^I^ tasJt forces got mpre 
involved these requests for transfer out' of 
patrol decreased ctrastically and, in fact, we 
started receiving requjests for'transferto pa- 
^ trofTrom oth^r elements. , 

.4., It improves the ease and success of^ irh- 
plementation of change due to the involve- 
ment and vested interest of those affectecj 
by the change. Consider the' statements of 
one ofouf' officei^s who was involved in one 
of the patrol task forces: > 

'They've said policemen fight change. -Well, 
that may not be true. It may have been the 
^ ' method of chiange, rather tban th^^change 
itself, that was resented. The patrolman 
wants change but he wants to have a part in 
de'ciding what- that change will be/' * . " 
There is; no intention to create the impression 
that the task force ap^oach is appropriate for all 
circumstances or that\it does nof^have negative as- 
pects. It is^a slow and time consupiing process. and 
increases the difficulties In controlling variables 
during the evaluation pharse of -a project. Wp^have , 
also found it not, to be the best approach for yevf 
technical areas or issues not a part of the^yjtryd^. 
duyes of^he task force members. w*^--^ 

Some iceys to successful operational research 

Based on my observations of our experiences 
in Kansas Ci*y,' Missouri Police Department / 
over the past decade, .there are several very key . 
'points' in concflfcting worthwhile and successful 
operational research. , v , ' / ^ 

The first thing which must exist is top man- 
agement support^ and commitmeTjt to such efforts 
and progi^ams. \\Vithout this it would be totally 
futile to try even the first step: This factor has 
been discussed in some detail in ithe preceding 
pkges of this paper, but its importance cannot be 
overemphasized. 

Another very important consideration «is the 
meaningful involvement of *the personnel of fhe 
organizkion including, in. fact especially, those at 



the tank and file level. Again, this has previously 
been discussed and stressed in preceding p^ges, 
but 5ears repeating. Too matT^ managers are in- 
clined to believe that ihe people of an^)rganization 
are totally against any changes, except increases in \ 

'the pay check and decreases in working Hours, and 
that they will d9 all within their power to resht \ 
change. Jhat just is not so. They do like to play a 

j)art i,n their destiny and it is to the organization's 
benefit to let them do so. Of. course, there .will be 
individuals who are e:jtceptions but it has been our 
experience that the enthusiasm and satisfaction 
generated within the majority results in peer in- • 
fluences preventing tkt)se individuals from 
generatingf serious or successful .resistance. It 
should go wit^hout saying that the reasoji for and 
subject of any research project or program must 
be legitimate and have as its goal the improvement 
bf*the organization and, the service it provides. Re- 
search purely for the sake of research should be 
taboo. If a manager cannot project the justifica- 
tions and potential benefits in a totallv convmcing 
manner it must be questioned as to whether the 
project or program is warranted. 

Total honestv with Jhe » personnel of the or- 
ganization is a must. They mus! be truthfully in- 
formed of the purpose of the research and the 
methods to b/e,emploved. I am aware of one or- 
ganization which utilized field observers to gather 
data for their research. The rawtT^and file were 
given a ficticious account of what tvpe of informa- 
tion the observers were to gather. Once tKis decep- 
tion became known tft^ abiiitv to collect accurate 
and reliable data in {hat organization ceased to 
exist. E\en if the rank and file members are not an 
important source of data for the research or are 
not otherwise involved in the process, a iack of fac- 
tual information will \ikely result in rumors and 
inaccurate perceptions, thereby detracting from 
the value and success of the research In our. de- 
partment we utilized venous means in efforts to 
keep personnel informed. Personnel di^rectlv in- 
volved in the projects were urgecl to utilize every 
opportunity to communicate with their peers, 
briefings on current projects were included in re- , 
cruit and in-service trainin ^^ lass'es, ^rticles were 
prinj[fd in the department newspaper, m^moran- 
'dums were written and distributed, and projects 
Vere discussed in staff meetings. It takes a lot of 
effort to keep information flowing to all parts of a 
large organization but the dividends make those 
efforts w5^thwhile, in fact necessary. 

♦ 

Operational research within a public service 
agency does present problems which ^re not as 
likely t^o (>e encountered in a product prdduc^g . 
organization or orfe whose service is less essential 
and visible. We must be continually responsive to 
the publics necd^ and demands for our service, 
offen* times on an unpredictaj^le and emergency 
basis. The research must be conducted in such a " 



manner that our ability for /such response is ^lot 
compromised. The gro«nd, rules for assuring this 
musi be set forth at, the beginning, and the re- 
search must be designed and structured with full 
understandyig and consideration of these rules. 
This itfiitial effort can avert many of the problems 
that \\\;)uld be entountered, but there is no way to 
anticipate all problems relative to conflict between 
the project and its evaluation and what would be 
otherwise normal changes such as personnel reas- 
Sighments, changes in personnel deployment, 
changes in organizational structure, changes in 
tactical strategies^ etc. When these conflicts arise 
those with primary responsibility for project ad- 
mifif^ation and those responsible for operations 
in provision of the agency's daily service to the 
public must confer and collaborate in resolving the 
conflict in the proper and. best ^terest of the pub- 
lic. This is not as easily done as said but it is neces- 
sary and possible. The Kansas City Poj^ice Depart- 
ment has certainly encountered some very knotty 
problems of this type and a gentleman who will 
speak to you, Dr George Kelling who was on the 
Police Foundation evaluation staff foY some of our 
projects, cmv relate the det^ails of some of those 
problem^s and their outcomes better than this 
writer 



21 



T 



Evaluation results 
and declslon-makihg: > 
' the need for 
program evaluation , 

Lee SSthrest 
Professor of Psychology 
Florida State ljniv>rsify 
Tallahassee, Ftorida 



16 



This paper attempts to make the strongest possible case for systematic evaluation of programs and other interventions directed toward 
the resolution of operational problems m service agencies It is based on the premise that many administrators have not thought 
^rough their own needs for information and the role that research data can play m effective decision making ■ 



ERIC 



Making decisions in any complex, real-life set- . 
ting is never a unidimensional, or even a simple, 
process. In order to make adequate decisions the 
wiseoxecptive knows that it is necessary to have 
good information on the effectiveness of some 
proposed at^or intervention. For example, before 
deciding whether to buy a certain type of txuQx^ 
gencv vehicle, a wise executive would want to know 
whether the vehicle could do what it was designed 
to do, whether it was engineered in such a way as 
not to create ,more problems than is solved, * 
whether it might also produce^'some nonobvious 
benefits by making possible the performance of 
other important tasks, and he would want to know 
whether the vehicle was really the |^st of its type. 
All the above established in the' affirmative, the 
decision t</ purchase the vehicle should not auto- 
matically be made Other factors of equal, and 
perhaps greater importance, would have to be 
considered. First, economics would be important. 
The cost of the vehicle would be important, and . 
mavbe critical No matter how good it was, an 
emergency vehicle might be -beyond the budget 
even imaginably available to ihe community, and 
even if affordable, the vehicle might cost too much 
more than the closest competitor. Practicalities 
might also be important rf it appeared that deliv- 
ery, of the top-rated vehicle might bQ^j|g delaj^ed 
or if service might be unduly diffi^B^ Political 
considerations might ar'ise, Suppos^the emer- 
gency vehicle in question w^re manufactured in 
the U.S.S.R.? No one would dare recommend its 
purchase. But even if it were only manufactured in 
another state and had to compete with a locally 
, manufactured product, it might be politicafly un- 
feasible to recommejjd its purchase. 

The complexities np more than ^inted at 
above are severe enough for the fairly ordinary af- ' 
fairs of public institutions, e.g., pu^^base anci 
^cleaning supplies, revision of accounting systems, 
deciding whether to stagger times of work shifts, 
but they are increased almost* irprpeasurably when 



decisions have to be made in the context of ongo- 
ing and critical public services. To revert to the 
example noted earlier, if the decision were which 
model of a garbage truck to buy,\he fact that one 
model might result in slightly higher spillage than 
another would be trogublesome but scarcely beyond ^ 
dealing with. When the problem, however, is the^ 
purchase of emergency vehicles apd the issue is* 
the saving, or possible saving of lives, feelings run . 
high an,d decisions must take more factors into ac- 
count. It follows, then that decisions in critical 
public services may not reflect qui^ so clearly the 
harder more factual information on effectiveness 
of a proposed intervention. 

. The position taken here is that despite the* 
complexity of decifcon processes in such areas as 
emergency medical systems — and, aS we. shall see, 
police systems also — data on effectiveness based on 
careful evaluations is stilly an important element in 
the^ecrsion process, even if the final decision goes 
against evaluation results. Kn administrator may 
find that a suggested change in oMrations would 
be economically unfeasible, that^inwould be politi- 
cally unacce|)table iff his community, that it would 
-be resisted too strongly by employees at lower 
^evels, and he might decide ^against implementing a 
change even though on other grounds it would be 
desirable. It is the contention of this writer that the 
administrator should know^ exactly what he is sac- 
rificing, the price hd^lfe Paying to maintain labor 
peace, to avoid having to^ask foh additional fund- 
ing. There is absolutely no advantage ip making 
decisions in wh^^h on^ of the important elements is 
an unkriown. If, fot example, a proposed, new 
emergency vehicle would be little more effective * 
than those already available and'the gther costs are 
sizeable, the administrator's decision 'is a sfmple 
one. If, on the other hand, the propKJsed vehicle 
would actually perform significantly belter and re- 
sult in be^jp^outcomes for emergency Cases, the 
administrator can understand his decision as an 
honest and rational one and can also take comfort 



22 



in the JjLnowledge that if some of the other factors 
change^ e.g., economic situation improves, there is 
a good basis foj reconsideration.^ 

Ther-efore, we dan only recommend that ad- 
ministrators cooperate in^ indeedj^^ist on, obtain- 
ing the best information possible about program 
effectiveness since that information is not only an • 
jmportant but critical element iri^ managerial 
decision-making. 

What is program evaluation? 

At this point it might h*elp to nfake clear just 
what is meant by a program or an intervention, 
wriat is meant by an evaluation, and what is meant I 
by effectiveness. In the broadest sense we mean by 
a program or an intervention, any alteration in an 

'organization, including changes in personnel,' in 
equipment, or in operating procyAires, and that is 
intended to improve the operations of the organi- 
zation and make it more likely to achieve its goals. 
When a rescue squad purchases a new conimunica- 
tions system, when a department of public safety- 
replaces an admmistrator judged ineffective, when 
an emergency fervices delivery program is re- 
gionalized, when all rescue team members are Ve-^ 

* quired to undergo some training program, these 
are all instances of interventions of the type we 
. have in mind. Then when we say they shoujd be 
evaluated we mean that some process should be es-v 
tablished to determine whettier the intended ef- 
fects are achieved. If a baseball team m a slump 
fires its manager, it is reasonable to keep track of 
performances of individual players and of the 
team as a whole.* If a new'communications system 
IS purchased, then procedures should be set up to 
determine whether communication's are affected. 
Does the delivery of emergency services change 
following regionalization? Do trained ambulance 
attendants perform differently as a result of their ' 
training? Then by effectiveness we mean whether 
the change(s) is in the intended direction, whether 
the change is about large as was anticipated, and 
whether thert are unexpected additional benefits 
for disadvantages resulting from the intervention. 
A neK^escue vehicle might not only be medically 
more desirable but might improve morale and 
pricr^^_oi/the squad. A new administrator might 
produce greater efficiency in operations but also 

^ produce undesirable turnover in personnel over 
the long run. 

What we are recommending is that all changes 
should be considerecj to be terrtpdrary, to be et- 
perimental, and that procedures should be estab- 
lished to evaluate their effects. Perhaps that may* 

'seem an urfrealistic recommendation, but in oim*^ 
view to do less is irrational. There is not much ♦ 
purpose in replacing one adjninistrator by another 
in order to improve organizational performance 
without haviqg some way of knowing whether the 
improvement takes place. It does not make much 



sen A, to buy a new# piece of equipment without 
havinjglsome plan for*determining whether it 
worksl bletter. It is obvious that different types of 
decisi<^i^ may be evaluated in different ways, and" 
*not altj require formal study and experimentation. 

' Some^|eyaluations occur in xthe normal course of 
events,! ^nd if the risks involved in simply waiting 
to see ^^hat happens^are not<too great, needed data 
will often emerge. There was "a. recent newspaper 
article reporting that steel-belted radial tires are 
undesirable for cars likely to be driven over 100 

' miles ppr h(yir because failure of Iheat dissipation 
leads to blowouts. This fact was discovered because 
of failure of such tires on police cars used in high 

. speed chases. It does seem just pcissibly a bit un- 
fortutiate for a police department proudly outfit- 
ted with steel-belted radials on its cars to learn that 
such tir^ are not such good choice^ right in the 
middle of a high speed chase. Note that even in 
this case, however, the conclusion was made possi- 
ble by*accumulating data across a number of dif- 
ferent jurisdictions. Think how long it might have 
taken for 100 on car police departments scattered 
around the country to learn the same thi^lg. Ob- 
viously if a major decision is to be macle, or a 
decision is to«be made which is not easily reversi- 
ble, simply waiting to see what happens is weak 
evaluation strategy. 

Some evaluations are pre-performed to at 
least some degree. Specifications for equipment, as 
an instance, are an attempt to ensure that the ^ 
equipment will perform as expected. From a 
strictly hardware, technological standpoint, it may 
be possible to draw up and enforce specifications 
in advance. Even in some other areas technology 
may be sufficiently advanced that a change can be 
made with reasonable confidence of effectiveness. 
For example, not every training program has to be 
evaluated in every setting. Eventually one .becomes 
confident that a given type of training is a desira-. 
ble thing. However, there are good reasons for 
making conservative estimates of the probable ef- 
fectiveness of new programs and for making at 
least sdme probing efforts to determine that the 
programs ar^ having their desired effects. 

It is tefripting to think that at least some type^ 
of programs or other interventio/is can be assumed 
to be effective, e.g., on local groundsjor by anal- 
ogy. Based on reviews of many other programs 
and innovations in. many other areas, we have-con- 
cluded that it is risky, if tiot downrigfit hazardous, 
to assume anything about the probable effect of a 
program. A large number of examples can be cited 
of programs and -practices which were assumecl to. 
be desirable or which became standard practice be- 
fore any evidence oP effectiveness was available 
and which have not only in some instances proven 
worthless^ but worse, have on occasion proven 
dangerous. It is also unfortunately the case that at 
least some of the^e^ programs persist and even pro- 



17 



23 



liferatb proven ineffectiveness. However, 

before po^^ng to specific examples, it might Ije 
noted also that even if a program can be assumed 
to be desirable, to be on the whole an improve- 
ment, it is much more difficult to know whether* 
any as^umed*.benefits are proportional to costs, ft 
may be possible to demonstrate conclusively By 
^purely technical evidence that a new communica- 
tions sysfetn will result in reduced dispatching 
time, but if' the system requires better trairied per- 
sonnel; renovation of space, etc., it may prove de- 
ceptively expensive. But e;ven if all those factors 
are known, it .may still be highly questionable, 
whether the projected decrease in ^dispatch time 
will be worth the costs. , 

Wastefulness of ineffective solutions to 
problems* 

"The problem with ineffective- **solutions" to 
problems is that they 3re wasteful, usually in seV- 
eral ways, and hence should not be tolerated. In 
these days of increasing pressures for accountabil- 
ity on the part of public institiuions, it is going to 
be increasingly necessary tp produce positive evi- 
dence of effectiveness of new programs and 
changes in old pnes. Ineffective programs are, 
quite obviously, wasteful of resources: space, time, 
talent, money. The city of Miami Beach, Florida, 
mandates that a physician ride along on every 
emergency vehicle run. If that physician does not 
in some substantial degree improve the results of 
emergency runs, then money — a good bit of 
it — and talent that could well be used elsewhere 
^re being wasted. However, at a less obvious level 
4han the wasting of resources, ineffective pro- 
grams are wasteful because they often involve sub- 
stantial and important opportunity costs, i.e., 
money or energy invested in one enterprise is not 
available for other, perhaps much more produc- 
,tive ^purposes. A relatively^ obvious opportunity 
cost is the economic one: purchase of one $13,000 
"emergency, vehicle means that two $6,500 vehicles 
cannot be purchased- T^ie hiring of a full-time 
emergency physician may 'mean that two fewer 
nurses can be- Employed. 'Money 'ipent to ihstall 
radiographic equipment in an emergency room I 
will not be available to renovate sp«ce to improve 
work-flow. ' k 

It needs alsp to be recognized that ineffective 
programs may be worse than simply wasteful be- 
cause they detract attention and energies from 
problems badly needing solution. For example, it 
ha^l^een noted that' almost any anti-delinquency 
program, even if it is quite ineffective, recfuces 
public anxieties ^bout the problem and an^ result- , 
jng pressures for a M>lution. It has been argued 
that .every ineffective delinquency program sets 
the field back aix)ut five years because that is how 
long k takes to discover that it is not working. The 
situation carinot-be different in- the^health field 



generally and in emergency medical services deliv- 
ery specifically. Think of the many changes in the 
EMS field that have been made with tjYe promise, 
but not the demonstration, qf effectiveness which 
Ijave been or may now be called into question. And 
think how those very changes have retarded fur- 
ther explorations into the problems involved. We 
want to reiterate the point here tl^t we believe it 
•essential to plan for the best possible evaluation of 
every change, or innovation, or new program. We 
believe that absolutely nothing about effectiveness 
can be assumed. 

Examples of unevaluated bad Ideas. 

Perhaps it might help at this, point to give 'a ^ 
few examples qf how reason and logic have led to 
erroneous conclusions, sometimes with results that 
have been quitfe unfortunate. A good initial exam- 
ple, because it pertains to the training of personnel 
involved in delivery of cri^rical public services is the 
set of assumptions that has long existed about ap- 
propriate training for police personnel. Since it is 
Evident that police are often subjected to consider- 
able stress, that «they must cope with danger, 
harassment, enforced quasimilitary discipline, and 
the like, it has seemed evident to just about 
everyone that police training should prepare offi- 
cers for those very experiences by providing occa- 
sions,, preferably numerous, of a high degree *of 
realism, on which they can practice the appro; 
priate responses. Consequently police training has 
been militaristic,^ physically and emotionally de^ 
man/ding, marked by stern- and stressful discipline, 
etc. A rewyyear^ ago it occurred to H.H. Earle 
(1^72p, an^offi'cer in the Los Angeles County 
Sheriffs Department, that the assumptions .on 
which so much of police trafning has beei^based 
just might be wrong. So he developed an alterna- 
tive training program characterized by relaxed 
discipline, rational exercise of authority, minimiza- 
tion of artifically induced stress, and the like. Half 

^' of the recruit class were assigned randomly to the 
traditional training program and half to the new 
experimental program. Th6 experimental pro- 
gram pfoved to produce patrolmen better ii$^(^very 
respect, both at the conclusiop of training ancl 
upon follow-up. The experi/nentally trained clas3 
were ^ven judged later to wear their uniforms bet- 
ter, and they scored significantly better in marks- 
manship. Can anything ^bout the training qf 
EMTs or paramedics be taken for granted? 

Owtx the years one of the convictions that has 
been prevalent about delinquent youth is that they 
come from ratlier generally disturbed familie? antiT 
that they need some sort of substitute parent, e.g., a 
"big brother,'* at least ft) tide them over, to help 
provide some of the attention and warmth that they 

, f?ul to get a^home. In the meantime,uhe family^ 
should receiv^ some s6rt of therapy or counselling. 
A recendy published study by^the Institute for So- 




cial Research at the University of Michigan Suggests 
that not only arc those assumptions eot ti^nable, 
tf^ey may in part, be absolutely wrong. An experi- 
mental test of the "volunteer * delinquency wotker 
program shoved that it is, at .best, of no value, and 
a further study showed ihki requiring the families 
of delinquents to participate in counselling pro- 
grams was worse th^leaving them alone (Berger & 
Gold, 1976).^ 

The above are but two of many examples that 
could be adduced. Anti-drug abuse programs based 
on the very best of assumptions have proven gener- 
ally worthless. The logic of probation and parole is 
inescapable, but neither seem to work at all. The 
state of Mame has recently, byaction oPthe legisla- 
ture, given up on pajole altogether. When. (Prison- 
ers are released, thev are released, and that is it. 
Although con t rove rsial» a rxcept report on the ef- 
fectiveness of rehabilitative tech^iiques with crimi- 
nal offenders (Martinson, 1974) concludes that 
there is no rehabilitative power, however logical and 
appealing, that produces, results in any dependable 
wav . 

The medical and health fields can provide as 
manv, and equ/llv good, examples. Cardiac Inten- 
sive Care UniK^ma; be of little or no ^alue and 
even harrq fi^ ljin some cases. Coronar) artery by- 
pass surgerv is (juite logical and, on the evidence^ 
Jittle justified. Health Maintenance Organi/ations 
are proliferating aibund the country because the) 
seem like a verv Rood idea There is as yet *no evi- 
dence of their effectiveness and some modest pieces, 
of evndWice suggeVing that they may be of little 
value Health educa\ion 1s "clearly a good idea, but 
at least as it has been ifnplemented, it is a waste of 
monev and effort. An interesting^ note oh health 
education cwmes Xrom Victor Weingarten, Presi-^ 
dent of the tnstitute f()i;<Public Affairs, who foun?^ 
that five major voluntary health agencies v^ere 
spending more than $100 rQillion per year for 
health information programs. Yet over a period of 
ten years there were only two instances of any at- 
tempt by any of the agencies to evalu4te anv of the 
material An insurance company spending $2 mil- 
lion per vear for health information has never had 
an evaluation of the materials over a period of 20 
yeys (Weingarten, 1974) A great deal of money 
and effort is being invested in the development of 
PSI^s with almost no evidence at all that they \vill 
e their intended effects and with distinct risks 
that they will have quite undesirable side effects. 

Two examples involving monetary consid- 
erations are' of special interest. "New York Bell 
Telephone Company^ concluded that they were 
spending too much money providing information 
services to subscribers who ought to look up the 
numbers in the telephone directories. They calcu- 
lated that by' instituting a charge for information 
service, which involved a commitment to refund 
$.30 ev^ry subscriber not using information 



C 



service, the compan)^ could -^Ve a jgreat deal of 
money. However, .subsequent to the invoking of ^ 
the information' service chacgff, there was such an, 
ei^ormous increase in requests for direcjtories ac- 
companied by unanticipated costs in refunding the 
$.30 to the huge number of subscribers w]io 
proved not to use information, that tPje company 
w^s faced with a^ very sizeable net los§, a figure 
around $2 million. A^elatively small scale experi- 

♦ ment might well have suggested what did in fact 
happen. Another example involving monej^is the 
hospital precertification program which was sup- 
posed to save Medicare and Medicaid funds by 
providing assurance that every hospitai admission 
^i*s, in fact, medically justified. However, precertifi- 
cation involves costs, and Drs. Thomas Bice and 
David Salkever are currently anafyzing data which 
suggest that the ''certificatje of need" in fact re- 
sulted in a net increase i'n'^costs of hospitalization, 
probably by about $5 00 per hospitalization. (Bice 
persjonal communicaUon). Not much, but when 
aggregated across all federally supported hos- 
pitalizations the total is/fairiy important. Again, an 
experimental trial of precertification migjjt have 
helped A trial (carried out in Hawaii) of review of 
amT)ulatory care for appropriateness of treatment 
indicated that such review is probably not cost ef- 
fective, i.e , it* costs more to conduct the review 
than is sa^ed by reducing inappropriate treatment 
costs {The Hawaii EMCRO, 1973). 

^ The treatment of patients in medical 
emerge^rcies provides other examples, especially 
pertinent m this context, of inadequately evaluated 
treatments, some of whtchT were taken for granted 
with some unfortunate results. Standard treatment 
for burns, as an instance, for man)' years called for 
administration of intravenous calcium al^mg with 

' massive blood transfusions, a practice now re- 
garded as harmfpl because the large amounts of 
calcium may induce Cardiac systole. The 
Trendlenburg position (head down) for shock'.vic- 
tims was recommended after World War I on the 
'btesis of experience i^with pelvic surgical patients, 

/ and on that basis alone it was accepted as good 
practice for 50 years or more. It is now known that 
that position is wrong, the preferred p^ition 
being with the patient's torso flat and the legs 
someafnat elevated, The Trendlenburg position 
example does illustrate the problem that arises 
when a treatment is better than some known alter- 
natives, e.g., it IS better than having the patient flat 
or with head elevated, but not the best alternative 
available. A partially effective treatment or ofher 
intervention may innibit^resoarch to a very power- ' 
ful degree: 



Evaluatioij: be^ln at thp beginning/ 

If good evaluatiori- is to be accomplished, we 
believe firmly tliat it must be planned for, and in 
fact it should be huilt in during the initial stages of 
rrr 



2^ 



program .pjanning and developn^nt. *Once theyj ' 
* / are underway, programs hav^ a strong teprfency to 
develop their own Internal logic and momentum so ^ 
that it is very difficult to probe into them to de- I 
termine their effectiveness, let alori'e to change 
theip. The vei:y examination ^of a program from 
the standpoint of its outcomes becomes quite 
threatening. People become identified with pjrb; 
grams and>^evelop a proprietary interest them 
at tHe very least. In ^ome instances the interest be- 
. comes material. 'As an example of the former, it is * 
very clear that any, proposal tqjijevaluate the'per- 
fqrmance and effectiveness of volunteer rescue 
squads would be likely to meet, with great resis- 

j tance from the Muads to be evaluated. But the re- 
sistance would not be any less if one were to pro- 
\pose a. comparative evaluation of emergency 
rooms operated under hospital control and those 
operated by contract with an outside firm of 
emergency physicians. The Experimental Medical 
Care Review Organization (Evaluation of Hawaii 

• , EMCRO, 1974) in Hawaii engendered great hostil- 
ity in the local medical community when it pub- 
lished a study interpretable as indicating that sub- 
-scribers to the Kaiser Permanente prepaid health 

, glan might be receiving better medical care than 
those citizen;^ seeking attention fro^p private prac- 
titioners.. The best way to maximize the chance 
that, an evaluation can be 'properly and correctly 
carried out is to build it into ^he program plans 
from the beginning. 



Evaluation is often expensive. ^ 

The potential expense of research cannot be 
glossed over. Program evalyation is rarely cheap, > 
or at least rarely both cheap and good. However, 
one's perspective op the cost of research has to in- 
clude the cost of the program or the treatment to 
^ be implemented, in some cases the cost accumu- 
lated over a goqd^nany years. The perspective also 
has to include $ome estimate of the likelihood that 
the change or intervention planned might actually 
be harrafuL, the likelihood that whatever bad ef- 
fects might result would be reversible and at what 
cost.^and the likelihood that a program might be- 
come a model for wide implementation. Even very 
expensive research may be worthwhile under some ' 
. circumstances. For examplie, one group was asked 
to, develop a plan to evaluate the effectiveness of 
^ an areawide EMS for which a federal grant of 

about $900K had been received.^ After due 
thought to the problems involved the planniirig 
group came up with an evaluation proposal which 
woulid have cost about Sl.S'miJIion, a result which 
caused a great deal of amusement and even dero- 
gation in some quarters. However, there are now 
^more than 200 regional EMS, with many millions ^ 
I of. federal dollars heixig spent, and still with very 

er|c ^ • 



little go^ information on 'wljich to make a judg- 
ment of witat is happening. - 

Many siiTiilar examples can easily be» found. 
There w^s a $3 mi^^ion dollar proposal to evaluate 
the performance qt seven nurse pKac#tioner 
(PRIMEX)'progranis, each graduating only a few* 
trainees eaj^i year» Viewed as-an evaluation of the ' 
seven specific programs the research woul<jl clearly 
hive been dreadfully ^xpensiye. 6n the b^^her 
hand, vieweeTas an evaluation of prototype pro- 
grams for potential 'nationwide imjilemeatation, ^ 
the research could have beeia considered <i real 
bargain. Evaluation of 9U system* is no{ being 
complishe^, in- p^rt becaUse the cost of evaluating 
any one installation would seem disproportionately* 
great in relation to the cdst of the ^system, say in 
one or two counties. Vet the aggregate cqst of 91 1 
systems across the country will be staggering, and 
they will all be in place before? anyone discovers 
whether it is really a godd idea or not. By that tifme 
it will be too late. ^ / 

Heavy expenditures for research can also be * 
justified when risks o^ad outcomes are substan: 
tial and .when those outcomes might not be easily . 
reversed. How much would it have been worth, for 
example,- to have done a definitive evaluation of^ 
the effects of thalidomide? Utilisation of various 
paramedical personnel would nof seem to be com- 
pletely without risk, and at least some of the risks • 
that are imaginable are also substantial, and the 
expenditure, of fairly large sums of money to 

^evaluate performance of paramedical personnel 
would seem completely justifiable. Somf changes 
or innovalions need careful evaluation, preferably 

' in a limited experiment, because they tend to be 
irreversible. It seems scarcely likely, for example^ 
that it would ever be possible to get the law 
changed so as to permit untrained ambulance at- 
tendants to function again, even ^ EMTs proved 
not to be .any better in performance. It will be dif- i 
ficult, perhaps impossible, for any community to i 
abandon its 91 j system once it is in place. Nearly 
all of the costs are incurred in start up, and by the 
time the system might be fouild to be no better 
thaCn previous systjgm^, it would be too late. A vol- 
unteer rescue squad, once replaced by hired staff, 
might be extraordinarily difficult to assemble 
again. 

To reiterate, research very often costs a lot of 
mone^ in absolute terms. Whether it is relatively, 
expensive and worth doing depend on a number 
of other factors, including especially whether a re- 
search effort 'is viewed as addressed to a specific 
time and space limited problem or whether it is 
addressed to a problem better considered as exten- 
sive in time.and space. 

More bmic research is needed 

One^of the distinct impediments to the kind of 
research which all of us would like to see done oa 

26 



EMSs — and many other health •programs — is that 
so much basic research, preparatory research, 
needs to be done, and there is so little impetus and 
, enthusiasm for doing if. We would all like fo know 
whether trauma centers save lives, whether EMT 
A-trainifty is worthwhile, Avhether it would ^worth- 
while to reduce rescue squad response time from 
ten to eight minutes. But^ we do not know how to 
measure outcomes, or even whether. that meas- 
uremerrt is possible. It is disturbingly difficult even 
to get basic data on emergency medical services: 
^hat proportion of ambulance runs involve un- 
conscious victims? what prof>ortion of ambulance 
runs involve lnultip^t^^victims? what proportion of 
ambulance riins involve burn victims of whSt^ de- 
gree of 'severity? on what proportion of runs is 
basic ^nd effective assistance ^already being ren- 
dered at the scene? The, list is virtually, endless. 
The answers ma*y be available, ^t they ard cer- 
tainly not readily availabl^j^nd tne unavailability 
of answers to just such simple questions is retard- 
ing research efforts. One cannot, for^example, ex- 
pect to evaluate EMT treatment of burn victims if 
there are very few burn casis handled. Nor can 
• one evaluate very well th^e handling of cases for 
which there is little variability, as might be the case 
for certain types of .relatively minor injuries for 
which the treatment would be obvious. As- yet very 
little is known about, the way rescue teams actually J? 
function, and uutil* that knowledge h obtained, it 
will be difficult .to advance in other areas. Unfor- 
tunately basic research, even in applied areas, is 
often tedious, has lowu immediate paybff, has very 
little payoff of any^ind to t}ie agencies thjjt are 
/ the subjects of the research, apd is jiot very 
^ glamorous. It is, unfortunately, only critical.' 

^ • 

Generalizability of findinos 

There might be some yron fusion created by 
some, of the above discussion because there have 
, " l^'en repeated jumps from, local to national ,prob- 
\em% from little to big problem^, etc. It. is apparent 
that the national interest in EMS research. q^^nnot 
b^ satisfied by purely local problems and .issues. 
Whether a new director of a department of public 
saWty* wilj do a better job than his predecessor is** 
•not an ite^of interest beyond the locality in which 
he pi?t)blem resides. Whether in a given commu- 
nity rescue squads should be kept^ togethset irx^ 
teams or shifteii around ^for convenience In 
scheduling is not a question of muc^ interest in j 

ashington, D.C. Nonetheless, we do want to af- . 
firm our belief tttat even local ager^Birs would do 
well to have evaluators ^vailableTo ISto^deterxnine 
the efrect^of evep sUch limited and total changes, 
whether the evaluators ire -regular staff members 
or consultant^. We believe th^t it is imf>ortant for 
local public agencies to know what they are doing 
and what effects th^y are having. )fJo*liM^r we 
would also, like to suggest that .the perspective that 




one takes on a proble'm may determine whether it 
is of purely local interest or whether it has more 
far reaching implicadons. The question of replac- 
ing Chief Jones wito Chief Smith is not very in- 
teresting, but the question whether replacement of 
Chipfs makes any difference when things are not 
going well, is at least a potentially interesting ques- 
tion. One investi'gatX^r has been able to show that 
when baseball teams change managers, perform- 
ance of the team generally im*proves (Grusky, 
1963). CouJa the same be true of the EMS? Simi- 
larly,Uhe question of scheduling of rescue ^uad 
.workers is of more general, interest if- one asks 
whether workers consistently assigned lergetJier 
function' more efficiently and effectively, whether 
th^y tend to develop role speci^ties, ai^ other like) 
questions. • . • 

In any case, it should be^clear that the interest 
of federal agencies is in research that contributes 
to the general body of krlowledge about the work:^^ 
ing of EMS ancf, at least in the longer run, to ^he' 
development of f>olicies to guide federal support 
for -EMSs. No matter how praiseworthy on othec 
grounds, a service program to benefit a local 
community cannot cjuaJify as'research. Still other 
research is of such parochial natur^a'rtd so far re- 
moved from interests of federal policy that it 
would not be likely, to^^gender much interest at 
the federal level. For example, what sort of uni- 
form wbuld be most suitable for EWTs in Houston 
might be an rssue of some concern there, but the 
implications beyond thaj community Would ve«y 
likejy be limited and probably (many would hppe) 
beyonda the policy interests of the federal govern- • 
ment. Research Will'be of greatestMnterest when it 
is addressed to problems of rather bro^fl concern, 

• when it promises Co provide new inforrMtmn, 
when that new information will be of valtft. ir^ un- . 
derstanding the basic processes' of EMS funcgon^ 
ing and when the results are likely. to be trans- 
latable into poficy statements and action 
implementations. ^ 

. Problems to be resolved . 

We do not want to glo» over any of. tfte prob- 
lems or limitations involved in the type of research 
and systematic jirogram evaluation we are propi;^^- 

here. Both ^he problems and limitations are 
numerous and severe, so m.uch so that t^iey remind . 
us of Winston Churchill's comment that democ- 
racy is a terrible form of government, having as 
virtually ^ts only strength the facti'fiat it is prefer-^ 
able ta any alternative. What,g(p^e alternative to 

• "det^mining whether 90^!!;5^Jr«at^hnen|»^ork? Prof. ^ 
trederick Mostellef^^plCe^^^tegttl 4hS«be only al- 
ternative to experinientihjf^j^t^* p^cjj>l%'is to fool 



21 



ERIC 



, aroimd with people (set* Qi^bfert, Mostel- 
ler. 1975}. / ^I^V^.,.- vV, ^ 
One dis^cv fimitatiSn evaluat^gn' 
'ois that adipin^str^^brs xtnotu oQen make decisions in 



a time frame that do|*s not encompass the deter- 
mination of effediveness of a proposed change. 
We suspect'that at lea«t some of, the urgency of de- . 
cision making may be exaggerated, but nonethe- 
les'sf if an incompetent person must be fired and 
replaced, Uhere will berno tij?ie to evaluate the ef- 
fects of the replacement. The reorganization of, a 
hospital and community health system may force 
changes in emergency medical sea^ices which can- 
not be ey^uated' before being made. However, in 
qMr^riew such problems merely reinforce in the 
Strongest way the case for doing research and 
evaluation whenever it is possible. By having avail- 
able a good data base, by having access to a fund of 
accjumulated research, by knowing the results of 
evaluations of programs similar to the one being 
considered, it should be possible to make more in- 
telligent, informed decisions with a miich higher 
j!)robability of'^payoff. Thus, for example, the 
twenty-five year re^arch program of Prof* Fred 
Fiedler on effectiveness of dfifferent types of lead- 
ers in different types of settings provides at least 
the possibility of doing better'in the replacement 
of an executive than merely hoping that the most 
available candidate will be an improvement (e.g., ' 
Fiedler, 1971). Ehough is known about media 
campaigns to inform the public about some service . 
that one need not start from scratch in designing 
an information campaign about a 911 system, e.g.. 
we know 'that public service TV announcements 
are rarely broadcast at prime times. Whatever in- 
formation is available about organizations, pro- 
grams, etc., Jias come ffom "research which was 
done when it was possible. The opportunity to do a 
gobd piece of research is not a regular occurrence, 
and nQ good opportunity should be passed up. 

The work of Nathan n of the lifistit^te c^^ 

Social Research at the University of .Nl^chigan has 
shown th^at there are s|^e fairly clea^j^Uinitations 
on the utilization of research findings in policy de- 
cisions (Caplan, Morrison, & Stambaugh, 1975). 
One of the clearest limitations was the reluctance 
of policy makers (o consider the use of research 
not done in their own settings. That is a very seri- 
ous limitation if it persists, because it is obviously 
impossibFe to replicate every bit of research in 
every setting. In some degree there is' going to 
have io be an effort made to educate adminis- 
tratorl to the use of research findings and to de- 
creaseRheir parochialism and sense of uniqueness 
and tiif^r fears of being wrong on occasion. 
Perhaps more stress by researchers on the more 
generalizea*ble features of their work Would be 
helpful and that suggests again the importance of 
the perspective in which the wprk is vievVed. While 
ilf i&^rue that Jio two cities, nor 'aoy two hospitals, 
n<W6ny two rescue squads are quite alike, it is simi- * 
larly true thatj^o two cjttes,jjgetc., are entirely dif- 
ferent. One needed and promisihg'line.of rese^ch 
that'could be carried out 9$ easily in the EMS field 



as any other is the conditions under which chaft^e** 
occuts, innovations^are disseminated, and research 
findings are utilized. 

We would like to conclude this section by re- 
verting to the point with which we began. The 
making of decisions about provision of pulilic serv- 
ices is a cpmplex matter that must take economic^ 
logistical, political,^and other realities into consicK^ 
eration. However, we believe that r^e effectiveness 
of a proposed change, innovation, or program is 
an equally vital reality which nlust be a factor in 
the decision of an administrator. We would grant 
that for political purposes an administrator might 
very well adogt a program knowp to be ineffefctive 
or of little worth, but that decisipn is better m^e 
in full knowledge of the program's lack of worth, 
even if the administrator then runs the risk of 
being considered cynical. Perhaps it is better to be 
cynical than to be gullible and naive. An adminis- 
tratfve body such as a city council may not want to 
vote funds for a prograqj because of. f6ar of citizen 
reactioji to higher tax rates, buf. we believe that 
those citizens<Slre^etter served if the city council 
fails to enact a pr^^gram in fuM knowledge of its 
actual sbcial wort^i. When I buy a car, itsr perform- 
ance^haracteristics^ is not the only factor affecting 
my JRision, but I want to know them. Ignorance 
is bliss only until, inevitably, its consequences catch 
up with you. 



References 

Berger, R., & Gold, M. Experiment in a juvenile 
court. Ann Arbor, Mich.: Institute for Social Re- 
search. 1976/ 

Caplan^ N., Morrison, A.^ Stambough, R.J. The use 
of sQCtal science knowledge in policy decij^ons at the na- 
tional level: h report to fespondents.' Ann Arbor, 
Mich.: Institute for Social Research, 1975. ' 

Earle, H.H. Police recruit training: stress vs nonstress. 
Springfield, III.: C,C. Thomas, 1973. 

Evaluation of Hawaii EMCRO. Report «nder con- 
tract- HSM 110—73-526 to National Center for 
Health Services Research by A.D. Little, Inc., 
Cambridge, Mass., 1974. 

Fiedler, i^.E. Validation and' extension of tfte con- * 
tingency modd of leadership effectiveness; a re- 
^w of empirical findixigs.^ Psychological Bulletin, 
19^1, 76, 1?8-148. % 

Gilbert, J.P.,^ight, R:H., & Mosteller, F. Assessing 
social innovations: an empirical base for policy. In 
C.A. Bennett & A. A. Lumscjaine (Eds.) Evaluation 
and experiment: some critical issues, in assessing sociaji 
programs, Nemjjfork: Academic Press, 1975, pp. 
39-193. >, ^ . 

Grusky^ O. Managerial iucc«sion and organiza- 
- tional effe|;tiveness, American J oytnal of Sociology, 
^ 1963,i69, 21-Sl. <^ 



Martinson, .R. What works — questions and answers 
about prison Reform, fife' Public Interest, 1974 
(Spring)-. 22-54. ^ 

cwuii EMCRO: An experiment in non-punitive 
peer review. Project Report. Grant No. 5 R18HS 
00795 SRC. National Center for Health Services 
Research. Rockville. Maryland, 1973. 

' Weingarien. V. Rejiort of findings aiQ recom- 
niendations of the President's JGommittee on 
Health Education. Health EductUion Monographs, 
1974, 2 (Supplemental ), ll-l( 



V 



4 



Evaluation Research: 
WhaMs tt 

and How Is It Done? 

Lind!t Victor" Esi*ov 
Psychology Department 
Florida Sl^ite University 
Tallahassee, -Florida 



Li 



24 



or to Have been accomplished. In this paper Linda Esrov, an evaluation research methothlogtst, describes the types and levels' 
iation that are m current favor 4 ^ >^ 



ERIC 



The tenn '"evaluation" is currently being used in several differej^t ways with widely different implications for how evalwtionsjhould 
be carried out.^While there is no one definition of evaluation that can be claimed to^e the correct one, there ate some evaluations • 

. that are^more' penetrating than others. It is important to know m what sense the term is being used when evaluations are smd to be 
desireaor to I ' 

^of evaluation 
« t 

Over th^ lastr ten years or so a confusing vari- 
ety of activities have been lumped together under 
the heading of evaluation research or program 
* evaluation. This diversity is so pronounced that I 
assume that many people, upon picking up a vol- 
ume entitled "Final Prograrn Evaluation Report,'* 
, would be hard pressed to predict much of any- 
thing about what type oT information is inside. Be- 
cause of thiriJiversity authors who have tried to 
provide a comprehensive definition of.pi:^ram 
evaluation, one that covers all of tHmkypes and 
levels of evaluation activities, have b^fn forced, to 
produce Jjroad generalities suth as the' feUowing. 
Prograip evaluation is any assessment or informa- 
tion that allows one to reach decisions, on pro- 
grams (Beifn&tein & Freeman, 1975): The vague- 
ness of this definition is a testimony to tfie fact that 
being pore explicit would have excluded some- 
bo|jdy who was doing something that he/she, called 
e^^uation research. This definition does, how- 
ever, make' the contribution of asserting the pur- 
pose of evaluation research or program evalua- 
tion. It has a generally agreed upon, applied 
purpose, that is, to aid decision-making concern- 
i"g» Pi^rams. However, this definition leaves un- 
at least two important considerations: 
e level of the evaluation (i.e., what is it 
aboUPthe program that is being assessed or 
* evaluated), *^ 

(2) the methodology of the evaluation (i.e., 
how^is th^ assessment or evaluation to be. 
done). 

' If one^ncludes these two specifications in a defini- 
tion of ]^rogi;ain evaluation, the definition no 
longer refers to the multitude of activities under- 
taken in th*e name of evaluation. Instead, u defines 
a specific type of evaluation and consequently 
excludes other types. For example, one Height de- 
fine what is generally believed to be the most sci- 
entifically defensible type of program evaluation, 
' namely evaluatfon as a controlled experiment (fe.g., 



speci 



Suchman, 1967; Campbe*ll, 1959; Weiss, 1372; 
Reicken & Boriich, 1974; Bennett & Lumsdaine, 
1975), as tHe use of the social science methodology 
of the controlled experiment tt) assess the extent to 
• which a program is successful in bringing about 
the desired changes in the target population. This 
can be viewed as one typeof evaluation. According 
to this definition what is being evaluated is the 
program's outcome or effectiveness in producing 
change and the method to be used is.tha^of the 
controlled experiment. 

As has been mentioned, however, there are 
numerous definitions of program evaluation in 
addition to this one of evaluation as a controlled 
^^xperiment. It is being suggested here that one of 
the reasons for the diversity is that different 
people are talking about different type* of evalua- 
tion activities when they define program evalua> 
tion. It is also proposed that two characteristics, 1) 
level (what is being evaluated) and 2) methodol- 
ogy, vary across different definitions of evaluation 
research, and therefore should be useful as a 
means to clas^ify'^ifferent types of evaluation. Ac- 
cordingly, these two characteristics will be used to 
develop a descriptive classification scheme that will 
attempt to include most of the activities that are 
currently labelled, evaluation research or program 
evaluation. The rationale for such a scheme is to 
pro>4ide descriptive information so that one is bet- 
ter able to differentiate among various evaluarion 
activities and hopefully to reduce some of the con- 
fu sion that is related to*the term '^program evalua- 
tion". Iryifcldition to the description of different 
^types of evaluation activities, an attempt will be 
made to point out each type's contributions to 
decisionmaking along with its limitations. Exam- 
ples of evaluations from Emergency Medical Serv- 
ices will be considered within this framework. 

Lovdit of Evaluation: What is Being Evaluated? 

Of the possibilities as to what it is about "a pro- ^ 

.30 . ' • 



gram that is t<^be evaluated (i.e., assessed in order 
to aid program decisior^jwaking) five will be iden- 
tified. These levels arei^MH 

(1) program planning^? objectives 
« (2) program implementation or structure 
- (3) program operation or process 

(4) program's production of desired change or 
outcome 
* (5) program impact. ,^ 

When assessing level 1, program planning, 
one is dealing vith the characterization of the so- 
cial problem area including what it is that needs 
improvement. This also includes th^ definition of 
programmatic elements apd the setting of goals 
and objectives. 

When assessing level 2, program implementa- . 
tion or structure, one is dealing with th^ inputs of 
the program such as resoyrces, equipment, man- 
power, facilities, etc. Often administration is 
included. 

When assessing level 3, program operations or 
process, one is dealing with the performance of 
daily program activities; the services delivered, the 
practices, strategies ^nd intervention effoft^r-^ 

Whea assessing level 4, the program's produc- 
tion bf desired change or outcome, one is dealing 
with the overall effectiveness of the program to 
meet its predetermined objectives. These objec- 
tives usually relate to measuring improvements or 
ch^ges in the target population. 

When assessip^ level 5, program impact, one 
is d^aiing with^utcomes that extend beyoyid-the 
specific individuals who are serv^ by the pro- 
gram, that IS, the effect of the program at th^ 
broader community level 

Wi vie^wmg th^c five levels of evaluation or 
what is evaluated, it can be sepn that they evolve 
frpn> the immediate consideration of deciding 
what form J:he program is to take (Ifevel 1) to the 
intermediate concerns of producing the program 
and delivering its services (levels 2 and 3) toithe 
ultiitiate notion of determining if the oOtconies, of^ 
.both individuals aod community, were what>^^s 
desired (levels 4 ^rid 5). It may be that evaluation 
at each of these levels can profitably accumulate to 
produce a particularly comprehensive program 
evaluation. However, even if evaluation is not to 
be carried out at all of these levels, it will be 
suggested later that a number of combinations of 
these levels of evaluation are very compatible due 
to certain methodological issues. 

The importance of recognizing the level of 
evaluation with which one '\s dealing should not be 
undereinj5Tiasized. One of th'e two most obvious 
shortcomings of many evaluation projects results 
from the lack of recognition, of what it is that is 
actually being evaluated. The*mistake often /nade 
is .to assume by demonstrating success at one l^vel 
that success has also Ibeen demonstrated at another 
^ higher level of the program. This should be rec- 
«opnized as an unverified assumption. For example. 



just ^cauSe a program was implemented^ as 
planned or according to certain standards, its ef- 
fective^ss in producing the desired change in its 
^ population has not been demonstrated. 

. Methodology of Evaluation: Hof Is It Done? ^ ^ 

In order to deal with a given level of a pro- 
graiYi in an evaluative manner one must' use some 
means of'assessing wprth, value, or success, ti 
should be recognized that an evaluative assessment 
is always^ a comparative process. There can be no 
absolute evaluation. If a program* is asserted to be 
effective or successful, some type of Comparison or. 
contrast has been, made. This comparison may be 
implicit or quite explicit. For example, on an im- ^ 
plicit basis the comparison may be that this pro- 
gram is as good as Other programs that one has the 
impression are successful or that this 'program is 
much better than one's impression of many other 
programs. The comparison process can also be 
made much m'ore explicit. As^will he discussed, the/ 
use of experimental design f^rmaliies the need for 
comparisons through the use of comparison 
groups or control groups. 

The need for corYiparisons in order to reach 
valid evaluative conclusions should be emphasized. 
vThe second of the two most obvious shortcomings 
of niany evaluation projects is that they often claim* 
more than theit" methodology can show. Many 
' studies make what Campbell and Stanley, (1966) 
call the **error,Of misplaced precisir^n". These 
studies attend^at great* length' to the cdllectipn of 
data concerning one. program but are little con- 
cerned with the comparison of wh^t conditions , 
would be like or vA\^i results would be produced 
without the program or with an alternative pro- 
gram. The error is often t^ assume that all of the 
details that one has measured a^ causally related to 
the one program. This cannot^Usually be demon- 
strated without explicit comparison unless it is 
completely implausible that a'nything other thari 
the program itself could have produced the re- 
sults. In the realm of social programs this state pf 
^ certainty does not usually exist. ^ ^ 

Of the possibilines as to how to do an evalua- 
tion, that is, wHat methodology is used, four- 
methods will be identified ajong with comments on 
their limitations and assets. The Eour methods are: * 

(1) description, , > - 

(2) informal evaluation or reliance upon 
, common sense - ^ 

(3) compacison with standards 

(4) experimental design. 

As a method, description is meant to be taken 
literally. It refers to the systematic charalcteriz2[tioYi 
or description of a situation or area of interest in ^ 
accurate and comprehensive manner. In a sense, 
description is nonevaluative and the addition of ^ 
one of the other methods (informal evaluation,* 
comparison with standards, or experimental 
design) applied to^descriptiop produces.an evalua- 



25 



ERIC 



3^ 



26 



ERIC 



tion. Description is included separately here be- 
cause the collection of a descriptive data base is 
suc'h ^an extensive part of many program 
evaluations! * 

There are many familiar flK of -d escnptive 
methodologies. For example, case studied can be 
pUrely^escrfptive account^ of situations, persons, ^ 
or eveiit*s. Surveys seek to provide descriptions and 
^use sampling methods 50 as to insure that the re- 
sults, are representative. of a certain, population. 
Other examples of the use of description are t^k 
an'^lyse^, job descriptions, ^nd critical inc*i<fents 
reports. * 

The method of informal evaluation is equiva- 
lent to the application ofVonventional wisdom or 
the use of one's **comm^n sense" in order to make 
judgments. 4nfoii«al evaluation can be charac- 

* tetized by its dependence upon casual observation 
ts the source of information and implicit goals as 
the Cj-jterion of value or success^ It is the unsys- 
temayc use of subjective judgment to determine 
worth and really is th^ embodiment of our every- 
day understanding of the nontechnical word 
evaluation. * - ' 

The problem with recommending informal 
evaluation is the likelihood that it wifl be of vari- 
able quality. There is no doubt tha^t at times in- 
formal' evaluation can be extreniely insightful. On 
the othef hand, informal ^evaluation cii^'also be 
superficial and distorted* and m:oduce invalid deci- 
sions as a result of the reliance upon unrepresen- 
tative anecdotes and unchecked impressions. Th^ 
problem becomes one of how to separate accurate 
fnorfi faulty impressions. ' / 

\ Using comparisons with standards as an 
•evaluative method dpes fadude the important con- 
sideration af making the comparison process 
explicit. The n)Qasurement process itself is there- 
fore usually very objective and the standards can 
usually be subjected* to empirical test. As will be 
-^discussed, th^ validity of this approach; however, 
depends upon what it is that is being evaluate^, 
(i.e.. the level) and the validity of the chosen 
standard^. 

Xhe methodolog)^ of experimental design is a 
purposeful and explf(||t approach to comparative 
measureme-nt. This method is particularly w'ell- 

• suited to determining which of two or more treat- 
nrVents^or programs is more effeciive or more suc- 

' cessful The classical experimental design in its 
, simplest fortn incorporates two important ideas: 
randt^m assignment of units (such as patierrts, hos- 
pitals, etc.) and a control or comparison group. As 
Boruch (1974) has noted, this comparison often 
takes one of two forms: the historical comparison, 
whicji is the basis for time series designs, compares 
the condiuon of the target group after the intrp- 
diiction of the progra-m with the condition prior to^ 
the introduction. ^ contemporary comparison, 
which is the **standard" control group, makes a 
comparison between the target group receiving the' 



32 



program and a control -group sampled fFomithe 
same* population as the target group, but not re- 
ceiving the program.^A comparison of >he differ- 
encfs between these^two groups is taken as an es- 
timate of the program's effects. 

. Therefore, in its simplest form the classical 
experiment is a situation where a^randomly chosen 
half 9f the units undejp study receives the program 
or treatment that is being evaluated and the Other 
half does not receive the program. These groups 
are then measured on the viriable' of interest (fdr 
example, morbidity) and a comparison is madebe- 
tween the outcomes for each group. As a resufr of 
the controlled comparison and randomization of| 
unity this method has the ability to show the de- ' 
gree to which the measured results were attained ♦ 
as a resuU of (•he program or treatment. Thus ex- ' 
perinfei|ts attempt to establish causal relaxions; 
e.g.,. waKdbc^program or treatment the cause of 
the obsen^ea changes in morbidity. Xhe impor- > 
tance^tJ:(tndom assignment to groups should be 
sStresaed H?ec^use if a comparison group is chosen 
bj) 'any <^tji'er-method either of the following two 
assurpptions ar^e required: 

(1) the^om pari son group is identical tp the 
treatment group in all other factors except 
for the treatment being studied, 

(2^ne can correct for any of the relevant dif- 
ferences between the control group and 
^ the treatment group, 
^t should be pointed out that it is often difficult, if 
not impossible, to meet these assumptions without. 
* rando/nization. 

In addition fo the type of true experim^tal 
design that has just been described there arJp also 
numerous oth^r designs which fail to meet the re- 
quirements of randomization. Th'ese are known as 
quasi-experimental designs and require that spe- 
cial efforts be made to rule oiTt plausible rival in- 
terpretations to the hypothesis that'^he treatment \ 
caused the observed differences. 

Classification of Types of Eva^ation Research 
or Program Evaluation: Level (What It Is That Is 
Being Evaluated) X Methodology. 

Now that we have distinguished^among five « 
different levels of evaluation and four different 
methodologies and described each of these briefly 
we can go oj\ to discuss the different types of pro- 
gram evaluation that are produced by the combi- 
nations of these levels and methodologies. Concep- 
tually one can envision a matrix with methodolo- 
gies serving as four column headings and levels of 
what is being evaluated serving as five row head- ^ 
ings. 

The twenty cells that are produced are what we are 
rrt^erring to as "types*' ofj^valuatioh. Actually this 
matrix overjsipfiplifies the situation quite a bit. 
Some of the cells, probably do npt exfst' or only, 
rarely. Some types of/evaluation are done at more 
than one. level and include. more than oqc 



Classification Scheme For Tyftes of 
Evaluation Activities 



Levek: What is 
Being Evaluated 



Methodology How is It Being Di 
^Description Informal Comparison 
' Evaluation with Stan- 
dards 



Program Plani 
Qr Objectives 

Program Imple- 
mentation or 
Stru^ure 

Program Opera- 
tion or Process 

Program Outcome" 
or 

Production of 
Desired Change 

Program Impact 



^p/iri- 
mantel 



7 



mte^hodologv and theij^efore are^ defined-lW rn()re 
than one'ceii. And in ai least one rather imp^riani 
instance a i\pe of evaluation is done ai a level ihflit 
IS included, in our matrix (program impact) but 
with a meihodologv Mhai is not included in our 
matrix This is •cost-benefit and cost effectiveness 
analysis There are probabh other omissions but 
despite the art^fici^litv of this rhairix it is hoped 
that.it will serve <he Useful funciioo of siruciuriog 
theifollowing discussion and, examples of ivpes of 
program evaluation. 



pes of Program Evaluation 

Evalnating program planning or objectives As has 
been mentioned. Ie\*el J , program planhmg. con- 
cerns the social problem area including what it is 
that needs improvement If a specific program has 
already been suggested, this level pf evaluation at- 
tempts to assess whether the contemplated action 
IS necessary Vr to determine whether its stated ob- 
jectives ^re appropriate If a particular program is 
not \et specified but action is finder consideration, 
evaluation for program planning concerns the col- 
lection of inf\)rmation that can help lead to the 
specification of objectives. As a result of this proc- 
ess these objectives should then be) related to resol- 

* ving a known social problem and meeting the 
needs of the group to whu:h the program is di- 
rected It can be noted here that in order to per- 
form higher level evaluation activities, particularh 
the determination of program outcome or effec- 
tiveness, it is necessar) to state objectives in terms 
of measurable outcomes. Thik should be dorve in 
the planning stage so that the program will be im- 
plemented \\ri order to best attain these goals * 

' Evaluating level 1* program planning, is* an 
issue of needs-assessmem and it would appear that 
the methodologies of description and informal 
e/aluatioh are best suited to this end. Thus 
needs-assessment surveys or censuses^can be con- 

^ :ted pfior to the implementation of the new 



IC 



program. X+iese mi^ht utilize some type of heajtnl 
status indfcaior as -a descri^ve ijidex.of health as 
it ewsts .psior to progF^m implementation. Thje 
' method will probacy not remain descriptive blit 
will become evaluative when* present health status 
is compared wiihihe health levet that is desired or 
expected. This is yprobably done on ap infc^rmal 
' , •evaluative level out the pos^ ibjJkV exists that there 
are explicit standards that can be used for com- 
parative purposes. . ' 

In. Emergency Medical Services descriptive in- 
formation has often been collected regarding the 
unmet need for ambulance services. These data 
^. t^cern thoie patients who arnve 4^ ^he erper- 
gency room with conditions serious enough^ito jus- 
tif) epiergenc) transport' but whTf have no^'re- 
. ceived such transport. If these data show that 
many persons {too many according to an informal' 
evaluation process)^r'e not receivipg emergenc) 
transport, they are u^ful (o judge the necessity of 
a prc^ram^iQ provide more emergency vehicles, 
• etc. and to judge the appropriateness of this pro- 
gram's objectives to solve this unmet need * * 

In Emergency Medical Services Systems collec: 
lion of descriptions for the determination of sys- 
tem level objectives is less .likely .10 occur because 
there already exist siandards^of a son, the fifteen 
points of the Emergency Medical Services Systems 
• Act of 1^73. \ 

In a recent project to develop a curriculum for 
training Emergency Medical Services adminis- 
trators a needs-assessment survey could have pro- 
duced usefui information for guiding the de- 
velopment of curricular tnaierials. It could have 
been of additional benefit. in helping to predict the 
likelihood of recruiting persorvs for such training 
at both the initial sye and other proposed sites. 

The evaluation j^f program planning and ob- 
jectives IS not really compatible with the methodol- 
ogy of experimental design. Qescripiive methods 
such aSi surveys are particularly good for telling 
one the present state of "the world" and evaluating 
planning is the assessment of whether the plan and' 
objectives fit **ihe world". 

Thus ihie type of evaluation that comes out of 
the combination of level 1. program planning and 
descriptive afid i*nformal evaluation can be consid- 
ered needs-assessment It is unlikely that any effort 
at program evaluation would stop at this initial ^ 
level of deterrfuning need. However, it is possible 
that if certain survey questionnaire items asked if 
persons would, for example, find more ambulance' 
services desirable, and the response was quite 
fayorable. then the as^^umption might be made as 
to the probable worth of the new program for in- 
creased ambulance services. The lack of any in- 
formation- on file objective worth or* effectivenesSi 
of the<e services, however-, makes this assumption 
totally untenable. Tl£s type of^evaluation, namely 
needs-assessment, is prf)bably well-recognized as 
occurring at the level of program planning. 



Evaluating program implementation or structure. 
In evaluating program implementati*on,^^ne is 
dealing with the inputs of the program such as re- 
sources, equipment, manpower, facilities, etc. As-* 
sessment at this level )s most appropriate for what 
can be callect compliance-control (Alkin in Weiss, 
I972).irhus through the use of description of the 
resources and facilities of a program it is possible 
to compare whether or not pi^gram contains 
the elements proposed dufing the^lanning phase,^ 
or^to compare/*mether or not the program is in 
dompliance vith certain guidelines or standards 
for its structure., Thi^f^'p^ of description of struc- 
ture h often rec^uired for funding purposes. One 
of the attractions of assessing the level of program 
implementation is. that the information to be col- 
lected at this level is concrete and often easily ob-' 
tained, HoweVer. problems arise wh^^he assump- 
tion is^made that by describing^ one has 
evaluated more than the: program's implement^' 
tion Gibson (1973) has pointed out that the Fed- 
eral Hig^av Safety Act of 1966 contained what 
were called "performance" or "outcome" criteria 
in Its Standard No. 11, Emerg?ncV Medical Serv- 
ices. As It turns out these criteria were ajmost ex- 
^clusivelv concerned with inputs or program im- 
plementation, not with outcome measures or pro- 
gram performance. However, t^e assumption that 
%\as being made, as Gibson (1973, p. 427) puts it, 
was that '*if facilities exist, thev are used, and if 
used thev make a difference". Thus it was assumed, 
that the inputs were related to operations or proc- 
esses^ and that these operations necessarily prp- 
duced the 'effective outcomes of. good medical 
care Similarly accreditations of Universities i« 
often made on the basis of number of books m the 
Iibrarv. number of Ph .D.'s oM the faculty, etc. and 
as with F.mergencv Medical Services, this emphasis 
or\* resources and facilities^bes not necessarilv 
provide evidence on effectiveness. Effectiveness is ^ 
another level of evaluation and the assumption of 
the relationship between inputs and outcomes 
must be verified. 

Thus evaluation of the, level of program im- 
plementation through descriptioB and possibly 
comparison with standards produces what we have 4 
called compliance control. It does not appear that 
experimental design u an appropriate means for 
assessing compliancf control The misleading confu- 
sion of this Tevel with the level of program eff^c- » 
tiveness may be based on the use of a questionable 
evaluation process: the conventional wisdom 
suggesting that good facilities ar|d resoijrx:es will 
result in good outcomes. 

Evaluating program operations or process. In 
evaluating leveJ 3, program operations or process, 
one IS dealing'wit+i'prograrn activities; the services 
delivered; the practices^ strategies, techniques, and 
intervention efforts. It is at this level that mo^t of 
the activities' that are labelled evaluative occur. 
While not degrading the impor^anc^ of knowing 



what operation^ do occur in an on-going program 
it cfLT\ be suggested that much of what is termed 
evaluation occurs at this level because evaluation 
here overlaps ct)nsiderably \vith mans^ement and 
adnlinistrative activities. As part of the Emergency 
Medical Services Systems Act of 1973, those sj^s- 
terris receiving federal funding are requirjed to in- 
clude *a evaluative coniponent. This/is often/ 
adhered to through inci'easing jthe visibility of 
thos^e (usually informal) evaluative activities that 
occur as part of the program's internal manage- 
rpent. As a result of this, program evaluation oTten 
becomes characterized as a*confusing mixture of 
nianagement and science. 

The dS^mbination of the level of program of^ 
erat'ions and the methodology of description alone 
or in combination with -either informal evaluation 
or comparison with standards can be termed de- 
scriptive moTiitonng. This is an important activity.* 
Through the use of description at the level, of pro- 
gram operations one cin^characterize exactly what 
activities are occurring as part of the {Program. 
Operations research and systems analysis go to 

• great lengths to descriptively characterize what ac- 
tual operations occur as part of the program, and 
what the ojganizational functioning of these. op^- 

• ations is,mduding a description of the relations or 
Minks to the other parts of the system. Descriptive. 

monitoring provides the informatio/i necessary to 
determine whether the target" population of the 
program is beings reached and whether the ac- 
tivities that are occurring are actually those that 
Were specified at the planning stage as being re- 
lated to the progtam s objectives. These are impor- 
tant contributions and it will be suggested 'that 
even at -higher levels of program evaluation this, 

• information is vafudble, if nox\necessary, for a 
") comprehensive evaJuation plan. 

The problem that occurs with descriptive 
monitoring is related to lack of recognition of the 
level of this evaluation. The -descfiption 'of services 
delivered is not necessarily an indicator of pro-' 
gram effectiveness. Those who would sugge;5t 
stopping evaluation at this level make the assump- 
tion that the effort expended and the efficiency of 
the services are 6nds iti themselves rather than 
means. Certainly aq efficiently run system and the 
delivery of services may be necessary for program 
effectiveness, but they may. not be sufficient, TTie 
well known evaluative criteria of ambulance re- 
sponse time and total rescue ruji timejn Emer- 
gency Medical Services are problematic .examples 
of remaining at the evaluation level of [Jrogram 
operations. 

Another rat^pnale for stopping evaluation ef- 
forts at the levej of program operations is that 
program objectives may not have been operation- 
ally defined in terms of measurable outcomes, or 
the outcomes may be uncertain or difficult tQ 
measure. Thus evaluators may rely on the use of 
illustrative incidents, case reports, or testimonials 

34 . 



to provide both description and informal evallia- 
tiens of effectiveness. Again, this raises the issues 
of confusing the level oY operation^ with the level 
of outcom'e. ^ , ' ' - 

Alternatively, eialuations may use comparison 
with standards in order to make the leap from the 
measurement of ope/atioa^to the assumption of 
effectiveness. This is a common method used for 
assessing the quality of medicaPcare. In a recent 
stud\. Frazier. Lalh. and Cannon (1973) evaluated 
the quafitt^%F care given b) emergency medical 
techi2icians b\ comparing the activities that the 
technicians performed -v\ ith whax the\ called, 
"mandated treatments*'. Mandated treatments 
wevG explicit process standards »f what treatment 
should be given if a- patient presented with a par- 
ticular sign or s\mptom complex. While this stud\ 
provided some imprrrtant information toncerning 
emergenc\ medical technicians' activities, its value 
as an index of qualit\ of care is dependent upon 
the relationship between the standards (mandated 
treatments) and patient outcomes 

Medical care is also often e\aluated through 
the use of expert judgments This can be seen to 
be the comparison with stan/:lar<is mcthociolog\ if 
one notes that experts are assumed to have useful 
internal standards or implicit proc^^s criteria of 
what IS usual or acceptable as a re?ult ofjheir 
training and experience Again the \alidit\ of 
comparing program operations wlTh standard's 
measures of program effectiveness is dependent 
upon the \alidit\ of the relationship betwe^^n the 
end result (e g . patient outcome) and the opera- 
tion This validity ma\ ha\c been tested through 
earlier studies as is the case .w^irh rrtanv professional ^ 
standard^for which data exist clearK supporting 
the desirabilit\ of the operations However. man\ 
practices go on because of traditiort and profes- 
sional \alues ratfcr«^than data concerning effec- 
tiveness As Bernstein and Freeman (1973) point 
out this IS th^ case for the evaluation of school 
health programs whf^re the annual ph\sical exams 
for children are probabU inappropriate evaluate e 
critcfia. 

Thus, the level of program operations can be 
validiv assessed through the means of descriptive 
moniiortng Fxpenmjental design is probablv not 
necessar\ for this purpose A common problem, 
however, is to assume that one has evaluated more 

* than the levei of operations Procedures that com- 
pclre program activities with standards not for de- 
scription and compliance alone but for making 
judgments concerning outcomes, must recognize 

' the possibility of invalid causal links between the 

'activities and the outcomes. 

Evaluating program outcome or production of the 
desired changes The level of evaluation dealing with 
program outcome or the production of the desired 
changes has been defined a.<;»dealing with the over- 
all effectiveness of the program to meet its pre- 
determined -objectKes. As was*noted. these o,b)cc- 

ERIC 



tives usually relate to measuring improvements'or 
changes \n ihe target population. For example, the 
objectives of Emergency Medical Services Systems 
ma\ be defineS as ihe prevention of disability and' 
suffering in persons with injury or acute illness 
(Willemain, 1974). Thus assessing program out;- 
a)mes in Emergency Medical Services can be done 
in terms of the reduction of death, disability and 
suffering or alternativelv in terms of imprpving 
health status. 

The combination of level 4, program outcome 
measurement with description and informal evalu- 
ation, can result in the case study. In this type of 
evaluation information is collected on the target 
group only after exposure to' the pVogram, The 
criteria that are measured ma\ be^ appropriate 
operationalizations of the stated objectives or this 
method can also be used when objectives have not 
been operationally defined In either situation, the 
case stud\ pro\idcs a completeK inadequate as- 
sessment of the program^ effectiveness or produc- 
tion of desired changed There is no explicit com-^ 
parison which alTows one Co attribute observed 
changes to the program itself The onl\ compari- 
son IS an informal. Implicit comparison with one's 
pre\ious experience As has been noted the prob- 
lem with anv t\pf of/informal e\aluation \s its un- 
.know n biases 

The methods of description, informal evalua- 
tion, and comparison with standards at the level of 
program outcomes can also produce what can be 
called performance monitoring This is ver\ much a 
part of operations research, and svstems analvsis 
and differs from descriptive monitoring in that the 
>-actua1 operationali/ations of the program objec- 
tives arc being assessed (lefel 4 rather than level 
5) Often specific performance objectives are de- 
veloped or projections are made as to what level of 
performance should be achieved within a certain 
time period. This^vpe of forecasting is often npade 
. on a weak empirical basis. Comparisons can alst be 
made with past program performance or occa 
sionalU with the performance of a sKnilar 
program. / 

Rees (in Boruch &: Reicken. 1975) makes the 
point that the types of information svstems that 
are developed for management and performance 
monitoring are usuallv inadequate for the accurate' 
estimation of program outcomes Although out-v 
comes are often measured there is usualK no in- 
formation on comparison groups who do not re- \ 
ceive the program Without this Upe of compari- 
son u IS difficult to attribute effects or outcomes to * 
the .program itself. Rees also notes that even^ 
though time series data are sometimes provided, 
that is. measurenTent< prior to and after program 
implementation, thev are Ux) short (there arotwo 
few measurement points)' U) be interpreted with 
much confidence. Rees' final criticism of perform- 
ance monitoring ^dis ^n evaluajive approach to de- 
termine effectiveness of outcome.- is that it is mis- 

(5" 




30 



taken to believe, that theNimple collection of in- 
formation on program participants, without use of 
a research design, can produce good evaluations. 

Despite the problems involved, program deci- 
sions are hs%ed (m prrfornmnce*inpnjtofing. As l\zs 
been mentioned, the methodology often conforms 
to the comparison with sttindards approach ipro- 
gram performance is compared with some relfrtve 
or absolute .standard of expected performance in 
order to determine the extent to which prcfgram 
objectives are being met). TheVe is, however, rto 
test of other C3usal factors haying produced these 
results rather than rhe program itself. Despite this 
methodological limitation, performance monitoring is 
used as the .dma base to plan, alter, and adjust 
program activities in order to increase th^ proba- 
bility of achieving program goals. 

The defense for utilizing information that is 
of unknown validity is, of course, one of adminis-^ 
trati\#? necessity. Program managers are faced with 
the need to take actioa on the basis of incomplete 
information and performance monitoring is often 
all there is to go on In addition, experimentation 
is not the ^s\Ner for all questions of validity in 
program planning development^&nd management 
As Campbell ( 1974Thas'noted, much of ^this is 
mainlv a matter of common sense ^knowing: it 
\NT7tild be cumbersome to do aracxperiment on all 
features ^manv errors of planiting are visible to 
the nal«S?d e\e'* After something is inipfemented, 
on^ can often see that it is not acceptable or not 
uhat was expected Campbell uses the analog\ of 
debugging a cortiputer program here It could be 
suggested also that if a program manager sought 
all of the answers to validit\ questions he would 
luse much of his time and resources without deliv- 
ering manv services. • - ^ 
*There is a problem though, if one s orfenta- 
tion IS to equate evaluation solely with a model of 
continuous pcrfcrrmance monitoring for im- 
mediate feedback to make revisions andialterations 
of program elements. In ( urriculunf Evaluation 
where this tvpe of continuous monitoring with 
feedback is known as formative evaluation, this 
process is considered as a precursor to a summa- 
int or outcome evaluatK)n.^Thus if the real ques- 
, tion of interest concerns the level of program out- 
come or effectiveness, program managers should 
be encouraged to go beyond performance 
monitoring and to introduce planned "variations 
into their projects. There are opportunities for th^ 
evzlluation of the effectiveness of different 
strategies and different cbmporfents through the 
use c)f experimental designs. In addition, program 
managers can begin to collect better time series 
data so that if true experim^ts prove unworkable, 
quasi-experiments can be attempted. 

The evaluation of level 4, program outcome,* 
through the method of experimental design is 
'^eneralJy considered the most appropriaie way to 
Measure program effectiveness dr outcome. The 



classical experimental design including random as- 
signment of subjects to the treatment-condition 
and a control-no-treatment condition has been de- 
^ scribed earlier along with its advantages. The piost 
important issue is that an appropriate c6mpariSon 
must exist so tljaj the measured changes or out- 
comes can be causally^ linked to the prograTm or 
treatment and can, not be accounted fpr*in other 
' ways. ^ 

• A number of research projects in Emergency 

* Medical Servic©^^ h^ve' utilized the combination of 
outcome measurement and experimental design. 

, For example, Wortnfan (1975) reports on a study 
by Fletcher where the effectiveness of a *Tollow-up ^^ 
clerk" in an emerge-ncy ^oom was being evjd^uated.' 
This study included measurements at boiTiThe op- 
erations* (process) level and at the outcome level. 
The methodology was the classic experimental de- 
sign. Patients who came io the emergency rodm 
were randomly assigned to .either a *^*foilow-up 
clerk" who phoned to remind them to keep ap- 
pointments or to the usual procedure of receiving 
only an appointment slip. At the level of opera- 
tions the clerk was successful in encouraging more 
people to return f6r treatment as compared with 
the control condiHofi And records showed that 

• the **^ncQuraged" patients received significantly 
more diagnostic tests than their control counter- 
parts. However, when ooitcome criteria of health 
were measured, there was no difference betweeo 
the two groups. This study thus suggested that 
there was not a causal link between health, care and 
increased health in this situation. 

A study IS being conducted in Chicago by 
Sherman (1976) to evaluate the effectiveness of 
mobile intensive care units (MICUs) in reducing 
deaths due to myocardial infarction. This study at 
the outcome level is utilizing the research design 
of a mulpple time series This design involves a 
historical, comparison process. A number of - 
Chicago area commu-nitie$-have recently im- 
plemented MICUs and Sherman pfans to gather 
mortality data both pnof^ to the im'plementatiofi of 
these units and subsequent fo it H> determine if the 
introj^luciion of MICUs changes the pattern of 
these data. , ^ ' ' ' 

One point that shouW be made toncerning 
experimental designs at the level of program put- 
comes is that such studies are often greatly en- 
hanced by the collection of evaluative data at the 
level of pr6gram operations or processes. It may 
appear obvious but it is a good idea to Know 

. exactly what iooK place dtiring a program other- 
wise one ma> be dealing with the outcome or effec- 
tiveness of a treatment that is very different from 
what one thoiight one was examining. To illustrate, 
this pK)int, Hyman and Wright (196W relate a story 
about the evaluation of a propaganda campaign 

f based on the distribution of fliers. Due to a severe 
shortage of volunteers, however, it never pK)S- 
sible to distribute these fliers. Thus had the evalu- 



ERIC 



3'6 



ation taken place, the conclusion ihat the distribu- 
tion of literature w^as not effective in producing 
the desired outcome, attitude change, would have 
been quite misleading. While evaluating a literally 
nonexistent treatment may not he too much of a 
• threakto Emergency Medical Ser.vices, the collec- 
Hon prcxess data can provide other useful in-: 
formation. The following are some important uses 
of process information: 

(1) Process informlition can provide data coa- 
^ , cerning unanticipated or undesirable as 

. well as desirable outcomes. 

(2) Process data can provide an independent 
" ' cross-validation of the outcome effects. 

(3) Process data can provide impprtant infor- 
mation for estimating the plausibility of 
rival threats to Interpretatioa in quasi- 

' experimental designs 

(4) Process data can provide information for 
new hvpotheses. 

^Evaluating program impact. Rrogram impact was 
defined not as the equivalent of program outcome, 
as the term is sometimes used, but instead as the 
effect of the program on the broader communitv, 
those outside of the population consisting of the 
consumers of the program's services Therefore 
what it is^that is being evaluated is communitv 
outcomes 

This level of evaluation can be" combined with 
an\ of the methodologies but it is most likel\ to be 
assessed through description. Thus a descriptive 
^ base that is broader than the population served bv 
a program can be part of program impact evalua- 
tion. As^Atkisson ct al., (1974) point out the ' social 
ecolog\" of the whole community has become an 
important area of concern for evaluation. 

Community impact can also be assessed in a 
research design which is testing the hvpolhesis. 
would this community be any different if |he pro- 
gram did not exist or if the program had taken a 
different form? This type of evaluation can be par- 
ticularly useful if the^rogram. IS predicted fo pro- 
duce effects at the community level. It would seem 
^n Emergency \fedical Services that a stud\ de- 
signed to evaluate the effectiveness jbT categoriza- 
tion or health planning councils should attempt to 
assess commujiity impact. Thus the effects of 
interest would be system effects rather than indi- 
vidual effects. 

If It were determined that the role of Emer- 
gency Medical Services Systems appear^ to be to 
change the site of death from in the field to in the 
emergency room (as^as been Hypothesized bv 
Gibson), a legitimate q*uestion concern's the impact 
on thqf^community of these services. 

It can also be suggested that when cost/benefit 
and cost/effectiveness analyses are applied to pro- 
grams what it is that is being evaluated is program 
impact 

Cost/benefit analysis can be viewed as a step 
— the level of program outcomes both betause 

ERIC ' , 



it utilizes information on outcomes in order , to 
quantify benefits and because it deals with social 
Evaluations not individual evaluations. Cost/benefit 
^ analysis is. an approach which att/mpts to quantify 
both the costs and benefits of programs in cWer tq 
• determine whether the benefits achieved by a pro- 
gram exceed the costs. This approach appears to 
be best suited to comparisons among alternatives. 
Since few programs can be justified at any cost, 
this type of analysis produces informatian*that is 
relevant at the community level. 

In summary, a classification scheme has been 
suggested which describes types of program evalu- 
ation activities in terms' of what it is Aat is being 
evaluated (level) and how it is done (methodology). 
The five leveh of evaluation considered were:-(l) 
program planning or objectives; (2) program im- 
plementation, or structure; (3) program operations 
or process; (4) program outcome or ability to pro- 
duce*change; and (5) program impact. The 
methodologies were ( 1 ) description; (2) informal 
evaluation, (3) comparison with standards, and (4) 
experimental design. Two persistent problems in 
the evaluaiiofl ak^appear to be lack of the recog- 
nition of the leveloi^^he evaluation and lack of 
recognition of the limitations of^ertain meihodol- 
. ogles Examples from Emergency Medical Services 
were presented and the suggestion was made that 
comprehensive evaluation strategies should in- 
clude more^ihan one type of evaluation 

References 

Atikisson,' C.C., Mclniyre, M.H., Hargreaves; 
W.A., Harris, M.R., 8c Ochberg, F.M. A working 
model for mental health program evaluation. 
American Journal of Orthopsychiatry, 1974, 44, 741- 
753. 

Bepneit, C.A. 8c Lums<;|aine, A. A. Evaluation and 
experiment New York: Academic Press, 1975, 

Bernstein, I.N. & Freeman, H.E. Academic and en- 
trepxeneurial ^research New York: Russell Sage, 
1975. 

Boruch, R.F. On appro.xinjation to true experi- 
ments. Paper preseniech at Loyola Institute on 
Evaluation Methodology, Lojola Univexsiiy, 
Chicago, 1974 — 

Boruch, R.F^& Riecken, H. Experimental testing of 
^^f^uWic polu% Boulder, Colorado: Wesiview Press, 

Camp)^ll, D.T. ReforTns as experiments. American 
Psychologist, 1969, 24, 409-429. ' ^ . 

Campbell, D.T. Qualitative knowing in action re- 
search. ^Address presented to American Psycholog- 
ical Association meeting. New Orleans, September 
1, 1974. 

« 

Frazier, W.H., Lally, P.P., 8c Cannon, J.F. EMT 
performancr evaluation: A clinical trial. Yale-New 
Haven Hospital, 1973. ^ 

3r7 . ' ■ ■ 



Gibson, G. Evaluative criteria for emergency am- 
bulance systems. Social^ Science and Medicine, 1973, 
7^ 425-454. • 

Hym^n, H.H. & Wright, C.R. Evaluating social ac- 
tion programs. In P.P. Lazarsfeld, et. al. (Eds.), 
The uses of Sociology. New York: Basic Books, pp. 
741-782. 

Riecken, H.W. & Boruch, R.F. Social experimental 
tion: A method for planning and evaluating social inter- 
vention. New york: Academic Pfess, 1974. 

Sherman, M.a!^. An evaluation of mobile intensive 
care units. Manuscript, Northwestern University, 
.1976. ' ^ 

Suchman, E.A. Evaluative research. New York: Rus- 
sell Sage, 1967. 

Weiss, C.H. Evaluation research: Methods of assessing 
program effectiveness. Englewood Cliffs, New Jersey: 
Prentice Hall, 1972 (a). 

W eiss, C.H. Evaluating action programs: Readings in 
social 'action and education. Boston: Allyn and Ba- 
con, 1972 (b). ^ 

Willemain, T.R. The status of perforiiiance nrveas- 
ures for emergency medical services- MIT opera- 
tions research center technical report no. 06-74, 
'1974. 

Wortman, P.M. Evaluation research: A psychologi- 
cal perspective. Manuscript, Northwestern^Univer- 
sity.. 1974.* • 



J 



Experimental Design 
Causal Inference 

Sechrest ^ - 

Professor of Psychology 
FloAida State University 
Tallahassee, Florida 





4. 



In designing research on the effectiveness of some program trr ifther intervention the problem is to dfsigi^lf^e research m such a way 
as to produee data which are as unanibtguously interpretable as possible. The interpretation which is desired ts that* a particular 
program Dr treatment definitely did or definitely did not have an effect on the outcome variables measyfed.'In the following paper 
Sechrest, an evaluation research methodologist, discusses the problems that are involved in designing research which will produce 
convincing results ' j ' 



33 



The aim of every evaluation project shotild be 
to produce^n unambiguous inference concerning 

• the \Jorth of the intervention being evaluated. To 
produce such an mference is rarely a straightfor- 
,ward ijiatter, an3 it often involves -teclrnological 
and methodological issues of truly formidable 
complexity.* However, to the degree that the fin^l 
inference of worth is in doubt or is^therwise am- 
biguous, the purpose of the jevaluat7on is vitiated. 
It is the thesis of this papex-thal-jrijethodologically 
sound experimentation is the surest way of reach- 
ing causal inferences of reasonable certainty. 

An experimental stiidy of a social intervention 
is devised to yield information permitting the in- 
ference of a causal link between the intervention 
being studied and the outcome. In the discussion 
which follows the experimental methods that ipay 
be employed in program evaluation are presented. 
While a strong case can be made for carrying out 
true experiments, to be defined later, in evaluating 
social programs, it is evident that such expefi- 
' ments cannot always be accomplished, and some 
approximations are required and may be rea- 
sonably tolerable. In the discussion which follows 
some of the methodological problems which, if not 

' peculiar to pi^ogram evaluation, often plague it are 
discussed also. ^ 

Scientific Methods of Investigatiori 

Experimentation is not the only method of 
science. Cochran (1955), one of the forenfost fig- 
ures in development' of experimental design^ and 
thtir associated statistics, describe three ap- 
proaches to scientific investigation: charue observa- 
tions, planned observations, and experiments. Scientific 
inferences have often come from some vepy un- 
usual happeninj;s noted by an alert scientist. The 
apple falling on Newton's head, th^ unusual con- 
t£|^inaUon of some plates in Alexander Fleming's 
IdboratoVy, aftd the identification of vinyl chloride 
as a carcinogen because of the common home loca- 



tion of several cancer victims, are only three 
instance of a multitude of serendipitous observa- 
* tions. In contrast to chanoS^fjfeservations are inves- 
tigations which use highly detailed, planned 
observation sdiedules such asrused in the Peterson 
study of physician performSince (1956), which in- 
volved the use of highly detailed protocols for ob- 
servation and the use of highly trained observers. 

While scientific inference can be a product of 
observation, intuition and jiidgment are not often 
the basis for very firm inferences about causes andv^^^ 
effects. Strong causal inferences are most often 
derived from specially contrived experiments. The 
word experiment connotes an interference with the 
ordinary ofccu'rrences of na|ure. Here we deliber- 
ately apply fer'tain chosen procedures for the pur- 
pose of measuring the effects of these procedures. 
An experiment is the surest way of elucidating re- 
lationships* that we are interested in observing or 
demonstrating. With the observational method, in- 
ferences of'causal linkages derived^from correla- 
tions would be hazardous and uncertain.' For 
example, a recent newspaper stojpy indicated that 
podiatrists have found thai car#ac disease victims - 
haye an unusually high incidence of bunion}! 
However, just what.liirils bunions to cardiac dis- 
ease is open to question; the podiatrists think bun- 
ion sufferers get less exercise. It would be 'even 
more hazardous if one relied upon intuition to 
infer causation. The precepts of science demand 
qbservable phenomena as evidence fof any 
assertions. 

Essentially the problem in evaluation research 
as in other areas of science is to make observations 
in sijch a way as to permit the drawing of infer- 
ences of a causal nature linking some treatment, 
independent variable, with an outcome, or de- 
pendent variable. Ideally w^would like to be abW 
to matj^e; an unambiguous inference, such aSt; 

— If 'two hospitilsK)f medium size are merged, 

39 



costs per unit of service delivered will go j|» rival hypotheses as possible as surely as possible. 



down. 

— If lanemployment in a given anea^^es up, 
there will be an adverse effect on average 
health status of residents within one year. 

— If food service workers are provided with an 
incentive to reduce waste, there wiH be a^de- 
crease in waste greater than the cost of the 
incentive. ^ ^ 

Unfortunately the inferences we are permitted afe 
rarely so straightforward. More often they will be 
of the Jorm: r 

— The merger of hospitals of medium size is 
often associated.with a decrease in co§ts per 
unit of service delivered. (But it may have 
, been because hospitals tend to merge wjien 
costs are abnormally, but temporarily, high.)' 

— When uij^mploynxent is a given area goes 
-up, there is likely to be a decrease in average 
health status of residents within one year. 
(But maybe because the healthier people' 
leave the community.)' 

— An incentive prjogram to reduce waste w^ai^ 
introduced into a food service, and waste 
went down. (But maybe becaruse' there was a 
change in food processing procedures dur- 
ing the study Or maybe merely because the* 
incentive program drew attention fo the 
problem.) 

Plausible rival hypotheses : 

The reasoo we very-^often cannot arrive at 
clearcut inferences of a causal nature is thai our^ 
-observations or investigations were conducted irW 
such a way as to leave tenable or possible one or 
more rival explanations to the one we favor. Such 
rival explanations have been called "plausible rival' 
hypotheses" by Campbell sfnci Stanley (f963). We 
^are all familiar with the tenifs of exfi^rimentation 
and use them regularly in ouAdaHy life.^Wc cannbt 
start our car.- We hypothesize that our battery is 
dead, and we try the lights) horn, or radio and 
find j>lenty^of power, Qtfr fittle experiment 
weakened, or even left uhac^^ptablf , the 
hypothesis ||at our bawSry^ was dead, So^we go on 
to another hf^th^sj^i CW:a neighb9f says,^'ri^411y 
found' some good tomato plants this year! Look at 
them; they are twice as large as-the onesT planted 
last year!" It is possible that he put more ffe^ilizer 
on them? Have we had better weather this yeaf? 
Each of those i(i^s is a plausible rivAl hypothesis 
to, the one that the plants- are supeVior. In the 
process of planning research we will be assisted 
greatly if we ask ourselves what alternative expla- 
nations for our findings ^ill still be possible after 
we have completed our study, and we will be better 
ableTb interpret research findings if we ask what 
alternative explanations might account for find- 
ings ava'ilableto Us. ^ 

Our^aim in researcli is to rule out as man^ 

. • ■ ' •• ■ • u 



The problem with jnany types of research, and 
with all poorly dOne rjesearch, is that plausible ex- 
planations are left open and reasonable. Under 
most circumstances, correlational studies, i^e., 
studies iavolving natural pbservaCions,*cio Yiot 
permit one to' rule out thej possibility t^at some 
underlying or third factor/ may account^for the 
. findings. Smokers have a high rate^^f lung cancer,# 
but many people still believe that there might be 
some und^lying factor that causes people both to 
want to* stnoke and to be sj^ceptible to lung 
cancer, j^me parts of the U.S. have unusually high 
or low rates of certain types of cancer, and maybe 
it is because of the mineral content of water a^d 
foods in those areas. But maybe also the areas dif- 
fer in the genetic stock of residents \)f |hem, 
nfaybe people who like the particular climates or 
j^iving conditions in those area« have dispositions to 
particular forms of cancer, oy maybe some o^h^ 
mysterious force is operating. Ho\f could we' get 
definitive answers? We (%^ld not, in fact, but if it 
vyere* feasible and acceptable in a free society, we" 
could 'take a saniple of teen-age boys afid teach 
some of them to smoke tobacco and prevent others 
from doing^o. If we chose randomf^which boys 
were to go in which j^oup, in twenty years or so 
we would begin to find out the real- answer to the 
smqking-lufig cancer qu^tjon. In the other case, 
we tould .assemble sizable gl0tips of people and 
their pick randomly from them sorgk to be sent to 
live in. Nebraska, some in New Mexico, and in* 
Georgia, etc. Ag^n, in twenty years or -so we would 
begin to get the data which would answer our^ 
question about geography and canqer. Clearly not 
all questions tan be answered by such experimen- 
tation. It is part of the art^and scientej>£,research 
design to conceive wa*ys of fathering data on prob- 
lems iiksuch a way as to zero in on the right an- 
so^verp^en iS|a really high degrjee of certainty ca^ 
never be achieved. ' ' ' . ' / 

The problems in program evaluation are not 
different in kind fro^n those posed above; the dif- 
ferences liejxiaiyily in complexity and scope. STtill, 
the aim of program evaluation ^ultimately is to be 
able to say with a high degree of certainty that 
whatever outcomes ^or impact) are achieved, they 
are the result of th^ program itself^and no ofher 
factor. We wan^ to be able to say that^t was the 
program ita^* and its particular characteristics- 
that led to change* or differences and that the 
change would not have occurred anyway, that dif- 
ferences are not attributable to the way' the s^- 
jects for th6 study were selected for differj.t 
treatments, that tKfc results could not have been at- 
tributable to events happening outside the context 
of. the stutdy heiu^ conducted, aqj^ so on. In the 
discussion that follows, we will discuss some ©f the 
types of study designs that might be employed in 
evaluating programs and what the advantages and 
disadvantages of each are likely to be. *A much 

40 ^ 



fulkrTreatment of |his topic may be found in the 

now classic monograph by Campbell Stanley 
.(1963), virtually i must reading for any serious 

s^tfdent of research design. A recent updating»of 
" that monograph by Cook And Campbell (1976) will 

also be very. helpfult* 

The Whj^of Experimentation ^ 

Why do we do experiments in the first place? 
* Welt, presumably because we are uncertain about 
^. the effect oT soni^ Tfeatjner^ 

want to make ^ome observations that wil^ead tp* 
^ defiftitive conclusion. An experiment is a way of 
putting a question to- nature or reality. But fhere 
are other ways of avoiding or reducing uncertainty 
^ thanlfcrough experimenting. At least o ne possibil- _ 




tron 



not even involve making atiy observa- 
)gic, QT reasoning! We may not be uncer- 
tain in the first place because all reason tells us is 
t-hat some treatment or some course of Action is 
good. One.wa^, Afr example; poittted to^ sure 
• cure for the paplem df poverty. His reasoning 
•.was impeccable. Poor people suffer from a lack of 
monevf ergo, give them sonre money, and they 4 
not be poor any longer. The problem with reason* 
ing'^is that it is&p^ten wrong. One little errW in a 
premise caiyle^d \q) utterly ^nrrong conclusions. A 
gfe^t m^ny'mi^dicartreaipfients, that are perfectly 
logical are also perfectly wron^. The same can ' 
^ sui-ely be said for a' great ma n)t^ social^ irrterv^n- 
tions. Still, when all else fails, when tl^re* is-*,Hfa- " 
pol^ibility of doing ariy kind'^of empirical study of ^ 
a problem, reasoning is^the reasonable thing to do. 
Many inferventions having to do with reduc- 
tion of^osRof operations may b^ examined in a 
logical manner. It requires no large scale experi- 
ment to decid^'that if two people are employed on 
a task that keeps either of theqp busy only a thisit^ 
«£theu»fe, money can be saved by eliminating one 
^ posHion. Still, we shouTd be slow to jump feven to 
Fipflncial conclusipns, because yery often we do not 
have all the information we need and dolpot even 
know that it- is needed. A good ^xani^ is pro- 
vided by the use of ope-officer police patrol cars in 
place of Iwo-offi^r car^. It only seems logical that 
one-officer* Cars .woilld save money since* most of ^ 
•»^at police pfficers do, e.g., writing traffic tickets, - 
taking non-injury accident reports, cleAly d<Tes 
^pt require two officers, ^ut ^ one-officer car de- 
pR^ymerft strategy doubles the numbe'fof cars 
neea^dN(f the same ^number of officers Js to be 
available orlShe streets. >ior^0ver, tlfcre'^e many 
types of c^Ils^-e.g., disturbance calls, accidents that 
require redi^cting traffic, etc., that^equire' two 
officers so that tiho cars have to be (iispaf|hed. 
Some police/officia[;j maintain tha't two-officer cars 
are less likely to be iftvolved in accidents tha|> 
one-officer cars; other officials maintain the oppo- 
site. The matter has not been resolvabl<| by logic, 
'and^it is clearly going to require a fairly major re 

ERIC ' . ' ■ 



search effort eten no com^ cldse to a ^finitive 
conclusion.^ . • , 

A .second way of reducing uncertainty that 
does not require /time-consuming and expensive 
data cqllection is- to capitalize on the experience, 
preferably base^i pn^esearch, of others. Proto^val 
shunt surgery does not have to be f-^st^dLin^vfery 
.hospital. Employment of /lurse practitioner does 
not have to be tested in, every pediatric clinic. 
Where data, good data, are' available, they can be 
used as a ba3j$ for decision-making, and they 
should be. To do so, however, reqi/ires knowledge 
of the*existence of the data, an^l sonr\e^degree of 
expertise in interpreting the data. One or more of 
those factors rpay m lacking for siny given'problem 
or in-any given^etting, Where tne requisites are 
.--Jaael^.ijj0ugh, me need for fiew daty collection is 
obviated. ?\cnange'in practice can be instituted 
anrf all ttiat needs to be done is to determine 
whether the change seems to produce the expected 
• results. 

A thTrd wja^of developing a baftis for 
decisionHT^tiptf^Pft exists, in some j%w Instances 
is througlfsinrmraftmi^ usually with the* aid of a 
computer, of. the projected change. For example, 
one group did a detailed and extensive task analy-^ 
sis, rather like a time and'motion study, of emer-^ 
gency room operations, of case loads, waiting 
^ tiTtet^ personnel availability and so forth. Thay 
were then able to synulafe qn^^computer the ef- 
fect^ of various changes in emergency *Oom staff- 
ing such as cutting b&ck on physicians, and increas- 
ing nursesTet^r The* problems with computer simu- 
lation begin with the need for a great deal of initial 
data collection as input for the simulation and end 
with the ne*ed for a considerable leap of -faith in 
deciding to implement a ch^ahge because the com- 
puter «ays. that it ought to work. A computer can 
only do \vhat it , was prografrtmed to do by some 
human, and how-it bel^ves is dependent upon 
.what was originally programmed^for its behaviors. 
A computer may not be able to tell, for instance, 
that two people working together will produce less 
i^orW^han expected because (hey will &pend a cer-^ 
tain amoun^t of tijne in gossip or other interper- 
sonal affajrs. ' X ' • 

Note mat even if changes are introduced on 
the basi« of one of* the factors j|ist menti-oned, 
there is still a need, or should be a need, to deter- 
mine whetRer they are effective in the new setting 
in which they take plac^e. The administrator, it 
seems to us, ha;^ only two choices once the 4^cisiQn 
hfc been made to introduce a change in practice or 
procecfaire: 1) the change can be assumed to be ef-^ 
f^tive, or 2) data can be collected by which effec- 
tiveness chan be judged. We h<ve come full circle. 
The need for data collection cannot be avoided un- 
less one Wants to operate on the basis of optimistic 
ignorance. If a decision to obtain data is made, the 
only question that remains i^he adequacy of tlfe 
data for the purpose of making a judgment of 

41- 



36 



i 



'effectiveness. That is what evaluation research 
methodology is all about, and that is why »we 
expeciment. * * * ^ ^ • 

^ Rendering Hypbth^set Implausible 

' Strictly shaking, we never prove tbat a 
hy(^othesis 6r» an e\plzxM\im\ is the correct one. 
There is always some/alternative that n^ight be 
dredged up. What we can do is make observations 
that will make the most lilcely alternatives implaus- 
ible or untenable. Under ideal circumstance. all 
the really plausible alternative e^planatiol^Dut 
^one can be eliminated, and a rajher. strong infer- 
ence about the ^^ffect of som^ change can be made. 
yfow to eliminate or seriously weaken those< alte'r- 
najtives is what experitnentii design is about. It is 
often helpful in understanding the problems that 
are involved to begin t\^ith $ome obvious, but 
faulty, types of designs" in oi"der to illustrate in a 
^^rly dramatic way what the problems are. 

i-et us first note, howe^r, tlie most ubiquitous 
plausible rival hypothesis of all 
how well exfie^ment i 
never be absolutefy certain 
suits could not have happen^ 
flip a coin ten 




nceV No matter 



times, there is a high probal^lity that at h 
of them woul^^get ten heads in a row. The^ 



ducted, we can 
ob^iua^ed re- 
thancfi: If.we 

saw someoBe tlip a coin ten tiifR ^nd ^e\ heads 
tvery time, we might w^ll be suspicious of eitlior 
the coin or the way it was being flipped. But if 
there \>'ere a thousand people flipping cpirts teft 

le 
is, 

fortunately, through application ^ appropriate, 

* statistical procedures a Way of telling in most in- 
stances whether an obtained finding could have* 
occurred by chance or not. In effect, what wt get is 
a statement of the* plausibility of danpe as an ex- 
planation in the form of a probability statement 
Thus a stAtiemeni' ^hat a (diffeoyice*bepw^(^tt^'tWo 
comparison groups i§ "statisticajTy significant"' at 

, the .01 level means that chance as an explanation 
of the difference is irnplausible since there is only 
one chance in 100 that a-difference of the^ size ob- . 
tained could 'have happe^ned by cliSnce. Note, 

.howpver, that no matter how signififant a statisti- 
cai finding Hiay be, there iwlways some possibility, 
^|^t,t»h^ result migKt h#l* occurred by chanee. As 

• a rival hypothesis chance* can never be conjpletely 
ruled out; it can only be .seriously weakened. 

J Siippose a co'unty health departmenl^^ con- 
cerned ^wit^h increasing rates of^venereal disease, 
develops a special counselling program for all re- 
peat victims and applies for fi/nds to imj lement 
/the program. A funding agency, whether aWounty 
health board or a state health departnient,lmight 
well a^k; "Does the program do any good?" The 



ERLC 



smart administrate^ would have anticipated thaK* 
question. There are several things the adminis- 
trator might have done to prepare to answer such' , v 
a question. At the very simplest level, h.e might 
hay^ied the coufl^ming program on a group of 
vrfrepeaters anfeknoted the number who^retul-nccj 



for trealnfent wilhii^the following year. Let us * 
suppose that 2ft% t^eturned. What would such a re- / 
suit show? Unfortunately almost nothing other 
than that the counselling program can be (fper- 
ated. If we were on a board expected to produce* 
funds for health programs, we \yould be inclined 
to asK such' questions as: How many would have 
returned without the counselling? Data of that sort • 
simply cannot constitute evidenc%for effectiveness 
of any program. They fall into tne category of *i 
feed my dog*these PamBy biscUits, see how healthy 
he is!" In their discussion of research designs 
Campbell and Stanley (1963) refer, to the forego- 
ing type of "^'evidence" as the "one-s^ot case 
study." In their presentation of different types of 
research designs they employ a useful notatipn 
which designates a treatment or intervention, in 
this^cas^ counselling, as X and a measurement or 
observation as O; Thus, the one-shot case study is 
diagramed as X O, a treatment followed by a 
measurement.' - * ' ^ 

* A slight improvement on the case study Would 
be^.^ffected if the adniinistrator had examined his 
records to determine that prior to the counselling 
program 40% of VD repeaters returned for treat- 
ment, within onfe yAr, resulting in a one-group 
prfft St- post test desi^, diagramed O X O. How- 
ever, we skeptics 43n the funding board might still 
ask such questions as: ' » 

— Is it possible tha^VD rates are goii^g down . . 
^ ^nyway? - « ' ' 

{ — Have opefations^^the clinic changed in any 
^ way that mighf^make repeaters less likely to 
rome in? „ ' 

— S^nce the rep^at^rs ^re- clearly growing 

• oliler^and VD. rates tend to bt lower in older 
' . fr''oup?» is ^it noX ^possible that this 
^ r^peatfer gr^oup .would be less likely'to con- 
tract nfew cases? * ' ^ 

— C(^l^ there have>be«h'^ public education 
*\ *canipaignor perhaps^TV series dramatjz-* 
ing the dangers qf^VD'dJUng the same«tijnfe ' 
* • period as the counselling and hence possibly * 
accounting for the drop? 

. — Was the counselling program started be- 
cause it was noticed that there wel^i great 
many repeaters at that timei If so* is ftat' 
Jikely tliat subsequently the number would 
go down arlyWk)^as these ithings usually "S^wi 
^ themselves out? * . 

Each of ihe above questions is based on an in\plicit 
plausible rival hypothesis that might account for 
the findings' equally as well as the^ counselling 
program. * . 

If the administratoi^ were able to ftatc that^jlT; 
a 9roup of YD repeaters seen in the clinic but tin-: ^ 
able fdr one reason or other to participate in th jj^ j^ 



lie was at 



counselling program, the repeat Kate 
40%, that wouJd be turned astatic |n>Up compari- 



42 



xo 

, son and diagtaraed— -the dotted line indicating* 

that the groups were not to l>c considered strictly 
comparable- they might be if they had been 
selected Randomly either to receive or not receive 
the counselling. Such a study might indicate that 
clinic procedures, community education cam- 
paigns, or whatever could not account for the find- 
ings, but that would require the assunciption. that 
the groups were really comparable ro begin with. 
If, for example, the cotnparison group consisted 
mostly of Hard <^ore repeat^s who refused to par- 
ticipate in counselling, then it is conceivable that 
their rate would be higher- anyway. Such a com- 
parison group would add very little certainty to 
the interpretation of the findings,. 

What is needed here is a Jrue experiment in 
which, froni a large group of eligible VD repeat- 
ers, some zv£ chosen randomly for the^ counselling 
program whilo others are accorded only the usual 
clinic services. There are two types of experimen- 
tal designs .with slightly, diff^ent advantages. In 
the p?retest-posttesli control group design, diag- 
- ramed R**0 X O, each group is measured prior 
R CK • O ^ 
to treatment, one g™ip is given the treatment, 

ajnd then there is another measure taken sub- 
sequent to treatment. One might, for example, de- 
termine VD rates for the year prior to counselling 
and the year following counselling for both a 

^ treated ahd an untreated group. If the experimien- 
tal and control groups art chosen randomly and if 
they are reasonably l^rge groups, they should be 

> very comparable at the time of the pretest. If the 
treatment has aji effect, they should be different at 
the time* of the posttests. 

The fact th^^ the two groups can be expected 
to differ OTfy at the posttest provides a clue to the 
nature of the other true experimental design, the 
posttest only control group design, which' is dia- 
gramed R X'O. IT subjects are assigned randomly 

R O ^ 
to groups and if th^groups are of reasonable size, 
the groups should be quite comparable (jn the pre^ 
test measure and there is, then, no reason to give 
it. There are at least t^ reasons for not using a 
pretest if one is not necessary. First, every measure 
costs something, and taking needless measures is 
wasteful of project resources, Second, it is at least 
possiDle that an experimental treatment may work 
diff^ently dep'exiding.on whether there has been a 
pretest or not, with'the consequence that results of 
an experiment employing a pretest may be 
generaMzable only to other settings in which pre- 
tests are used. For example, if one were interested 
in the effects tf a lecture on subjects' knowledge 
about certain aspects of respiration, it is at least 
possible that pretesled subjects would be more 
alert Jo criticalelements in the presentation and 

* The Rs here are ultod to iignify that lubjectt are aMigned randotnly to 
--Y"**nt and control conditioni. 




,that they would gain mpre than would be the case 
un^er ordinary conditions' of conducting the 
course, i.e., without a pretest. 

The essence of experimentation is to define 
experimental and control groups iq such a way 
that they differ onlyin the treatmenfto ivhich they 
are exposed. 'Under^luch circumstanced if ex|>eri- 
mental and control groups differ folloi^^og treat- 
ment, it can be inferred with considerable confi- 
dence that that difference was produced by the 
tre^nnent. Why, then, if experiments permit such 
definite inferences are not more 'experiments 
done? Why is any other design ever usedp One 
important reason is that many variables cannot be 
experimentally controlled, either for practical or 
for ethical-moral reasons. In. order to be assured 
that experimental and control groups dU¥er in no 
way ^ther than the treatment, the exjtfimenter 
has to be able to pr<5^ce the treatment when he 
wishes-or predict its occurrence well enough to be 
able to expose subjects to it as desired. One can- 
noi, for example, cause the President of the 
United States to make a s|>eech, but one can ex- 
pose subjects differentially ,to the speech when it 
occurs. However, one cannot produce natural dis- 
asters nor even predict them well enough to be 
able to expose a randomly chosen set of subjects to 
a disas^r, even if one wished to do so. The latter 
point retninds that some experiments would be 
unethical or immoral. We cannot deliberately ex- 
pose subjects to risks to life and limb, we cannot 
abuse them psychologically for the sake otscience. 
The loqg-ierj^i effects of child abuse, for example, 
cannot be studied experimentally; we will always 
be dependent ugpn observational data and quasi- 
experimental designs'. 

A second reason why experiments are not,- 
•more often done is that preconceptions about the 
efficacy of a treatment often limit willingness to 
distribute the treatment randomly, administering 
it to some and Vithholding it from others. Al- 
though the history of medicine, along with that of 
most other ameliorative professions, is replete with 
instances of treatments once thought mandatory 
but since abandoned as worthless or even harmful, 
e.gs, bloodletting, purging, it is still very oft^ the 
case that a new treatment is developed and applied 
to a few cases with apparently great success so that 
any subsequent suggestions of' the need for an ex- 
perimentaLtest meet immediately with the objec- 
tion that |t would be unethical to withhold the " 
treatment from anyone for experinjental '^pur- 
poses. Although Gilbert, Light, and Mosteller 
(1975) conclude from a review of experimental 
tests of medical innovations that on the whole one 
would be better off to have been in the control 
groupsjfconvictions about the; worth of new treat- 
ments develop rapidly and become quite strong. 
^ The's^me cap be said for many treatments havin]g^ 
to do with the delivery of health services. Mobile 
coronary care units, outreach pro^|pms. com- 

J3 



prehensive health care, and the like are services 
which are likely to be assumed to be valuable and. 
hence not researchable by Experimental methods. 
Consequently their ^eal worth often remains un- 
known although -great amount&^f money are 
being spent in. implementing thetn on a wide- 
spread basis. — 

There are many other reasons why experi- 
ments ^do not ^ore oTtM gel done, including the 
fact that the desirability of and need for a well- 
xontrollecf^experiment is often unrecognized; but 
it should also be noted that a good many more ex- 
periments get planned than ever are brought to a 
successful conclusion. Experiments'in the social 
arena, in real life, are not easy to do, and many a 
good, well-planned experiment falls victirp to vari- 
ous methodological and procedural ills during its 
course and ends up less adequate than was ever 
intended or even imagined. Despite the best laid 
plans, random assignment brealcs down, e.g., be- 
cause- the total number pf cases available is not 
large Enough or becapse some higher authority in- 
i sists on subverting/randomizatioa for politic&l or 
^personal reasons, ^bjitrol groups often get con- 
taminlfed when some aspects of th<e treatment 
program get implemented 4ti the control group as 
well/5<^i?i^times but of sheer carelessness subjects 
are tra(^sfeTred back and forth between groups or 
important changes ai^ niade in«the experimental 
treatment in midcourse. Sfcal experimentation is 
never easy, wfjich is all the^more reason t0 plan 
and stfive for the l\e9t expttiments possible. 
Methodological compf^n^fses in research %xe al- 
ways in a downward dtre^ioii. ^ ' 

Quasi-experinf>entf ' # 

■> » * 

Despite the positive plea whidy^can be rhade 
for true experiments, it k the case that com- 
promises do often have tp be hiade. Trgc experi- 
ments cannot afways he planned for, and tven 
when tney are, >^vent^*bftch fbrce compromises, 
that weaken thejn and that later d^artd some %of% 
of shcrfing up. When, for whatever reason, *it 
'proves HTlpossible TO a true experiment, there 
still are^alternatives that are better ihan rlo sys- 
tematic inveShtigation at all. The so-called quasi- 
experijnents are nearly alv^ys less conclusive than 
a true cxperiNfient because they do not permit the 
ruling out of a// plausible rival hypotheses, but by 
careful planning W them and judicjous u%e'of in-/ 
forit5[ation obtained^ 'often by* combining results. 
fi^5t W i s e veral studies, it has often bee^i 'f>ossible to 
arrive at findings whith arc reasonably persuasive 
to people willing 'to be persuaded at all. 

However, in our view, a qbfisi-expenmental ap- 
proach to a problem usually proves be ti^e con- 
sumihg, cxpensi\^, uncertain, and ultimately |t 
least a bit disajppointing. A good case irr point ii 
the'attgjipt that has been made ov^r Jj» past 
tw^ty years to link cigarette siqQJdng^Mgincer 
and other health pfcrbfcms. A long^riod of time 



has bejsn required to reach our present position, 
and the expenditure of money on various investi- 
gations Jhas bfen enormous. And still we are in a 
position of -uncertainty of at least great enough 
• proportions that those people who do not want to 
believe that^obacco is hazardous to health can 
argue with the evidence. A true experiment coUld 
never have been done, i.e., assigning on a random 
basis some group of yoiutjh to be taught to sm(^e 
and some otlier youth to be an abstinence condi- 
tion, but the forced reliance on weaker alternatives 
and the consequences of that reliance indit^e 
cTearly the disadvantages of the quasi- 
experimental approach. ' 

T^c point alk) should be made that weak or 
bad research is expensive at almost any price be- 
cause it does not lead to any conclusions. A good 
case in point is the series'of attempts which have 
been made over the years to evaluate federal man- 
power programs, e.g.. Job Corps. There have been 
24 evaluations conducted over a period during 
which $12.5 billibn has been spent on manpower 
programs, and in a review of those 2^ evaluations 
The Urban Institute concluded that the various 
studifs which have been done are so faulty iq de- 
sign and execution that neither singly nor in 
aggregate do they'provide any basis at all on which 
a policy maker might arrive at a decision about the 
worth of manpower prpgrams' (Ns^, .et al., 1973). 
That is expensive research. Unfortunately many, 
many more examples could be adduced. Whenever 
one can be done, one good experiment is likely to 
be worth more than^ almost ajiy number of 
alternatives. ' t 

When' the true experiment is not possible, 
there afe a number of alternatives of varying 
characteristics and valot which are very well de- 
scribed by Campbell and Stanley (1963). Space 
dges not permit the explication of more than two 
' or three examples of the designs which Camlpbell 
''and Stanley present, but we would like to illustrate 
some of ^he possibilities and problems. Before 
proceeding perhaps' it would.be useful to list the 
most common plausible rival hypotheses which can 
threaten the validity of an experiment conducted 
vyithout randomization, riie list, beinjg taken from 
Campbell and Stanley (1^63). 

History,* \hosc events, other than the experi- 
mental varfable but occurin^g during the same 

^ period of time, that might account for any change. 
For example, a television interview with a tocal 
sheriff about the 911 system could jeopaftlize an 

^ experimental public information campaign, espe- 
cially if the program were' broadcast in an **ex- 
penqj^tar area and not in a "coiitrol" area. 

Maturatton, the fact that things normally, 
'^change over time. There is ^r^ old saying in 
medipineahat with proper treatment a patient will 
recover from a cold in about a week;* otherwise 
takes seven days. 



Testing, the possibility that taking some meas- 
urement will in itself produce a change i^pon some 
subsequent occasion. If EMTs are anxious about 
performing some procedure because of- its un- 
familiarit), they ma^ be less anxious and produce 
different results on a second testing without re- 
gard to any actual changf^ in skill. 

^ Instrumentation, the chang<^s that can occur in 
an instrument or recording process over time and 
be mistaken for experimental effects. F^or exarri- 
►ple, if changes are made in a record system or if 
criteria for eligibility for a service are changed, an 

* unknowing investigator might be led to a mistaken 
conclusion. In one fire departnfent a cutback in 
personnel assigned to each engine led to the up- 
gra^ding of many fires from two to three alarms, 

' i.e., more engines are dispatched in order to Jteep 
the number of men present ^t a fire at a constant 
level. * 

Statistical regression^ a somewhat technical mat- 
ter having to do with the fact that if cases are 
selected for observation on the basis of extreme 
scores or eonditions, there is almost certain to be a 
shift toward Jess extreme values on a subsequent 
reineasU2rement, The ten "worst" hospitals in a, 
state will almost certainlv appear to have improved 
if looked at again in a year while the ten "best" will 
not look quite, so good. 

Selection biases, determining that some persons 
get a treatment arVd that others, do not can render 
obserjtations uninterpretable or misleading. For 
example.- t^ere is some indication that in early 
trials of certain surgica,L procedures only patients 
in good enough condition to survive the surgery 
were- inclnded in the experimentaP groups while 
the comparison groups included many patients in 
poor condition. • thus-making the surgery appear 
more successful than it was ^ 

Experimental mortality, referring to differefitial 
loss of cases frcAn experimental and comparison 
groups, as might occur in the comparison of a 
» voluntary experimental insurance program with a 
standard program 

It IS, of course, ^rue that two or more of the 
above problems might exist within any one investi- 
jgation anfd that they might interact in some wavs to 
•^-^make the problems even worse. It should also be 
recognized that the threats to the validity of 
quasi-experimehts can as easily obscure as enhance 
differences, thus creating the possibility that a 
treatment might erroneously appear worthless as 
well as erroneously appear valuable. 

The Xontequivalent Control Group Design. One 

t commonly encountered quasi-experimental 
design, and an understandably attVactive one, in- 
volves comparing a group which receives an ex- 
^ perimental treatment of some sort under condi- 
tions seen as not permitting ra.ndom assignment of 
some subjects to a group WfR which me' treatment 
is withheld. The invesHgatof will often anticipate 
*^ Tiorfs that whatever he finds niighf haye ck- 

ERIC . ' ' 



curred without the treatmenf,,e.g., because of 
6ther, broader community changes. Un.dtr those 
conditions it is desirable to ifave some group with ^ 
which to compare the experimental group tcr try to 
determine whether the chaffgWTbund are greater 
than would be expected in the natural course of 
eyents. Investigators will very often cast about ih 
search of a comparison group of some sort, usually 
a groi*p with characteristics highly similar to those 
of the experimental group. To the extent that the 
grouJ)s ar^ similar, then comparisons will be re- 
vealing. However, similarity must often be more* 
assumed than demonstrated, and even where some 

^ similarity^can be demonstrated, e.g.^ by demo- 
graphic comparisons, there may be strong resicfUal 
doubts if the experimental group is special in the 
way they were recruited into the experiment. 
Thus, for example, if the experimental group con- 
sists of all the employees of a TactQry who volun- 
teer for a new type of health insurance program, it 
may be very difficult to develop any assurance that 
any comparison group can be formed which would 
be similar enough to make a conclusion possible. 
If, on the other hand, the experimental group 
consisted of the clerical workers in Division A, a 
comparison group formed by the clerical workers 
in Division B might be quite useful if there seemed 
to be no particular reasons ^y workers were in 
one Division or the^the^ and if working condi- 
tions in the two Divisions see-med very much the , 
same The value of the non-equivalent comparison 
group will depend upon the case which can be 
made for similarity to the. experimental group on 
factors eritical to the dependent >or outcome • 
measure.- /' \ 

iiie Separate Sample Pretest-PoU-test Deiign.- 
Anotner research design that is rather frequently 
encountered in the health field and that illdstrates 
some of the gains as well as^ shortcomings of 
(juasi-experimental designs is the separate sample 

^ pretest-post-test design, number 12 in (lampbell 
and Stanley's (,1963) series. It v^ry often happens 
that some desired intervention is difficult to apply 
to ail isolated sample, but rather'must be applied 
to an entire populatioq^ A good example is a pub- 
lic educational campai*gn carried out over vmns 
me^ia. One cannot isolate a samply to be exposed 
to the campaign carried ou^ over mass media 
Another example occurs if an emergency rescue 
service changed its dispatch procedures at some 
point in time, it being improbable that the proce- 

> dures could be changed for only a rantlom sample 
of Calls. In such cases one might seek a comparison 
sample, e.g., a sample of individuals from a com- 
munity not exposed to the educational campaign, 
or a sample of rescue dispatch records from 
another emergency rescue servic e. However,* 
^another possibility might be to obtain the re- 
sponses from a sample of individuals in ihc com- 
munity prior to the ma$<> media effort and a sec: 
ond sample following the effort. If there is a 

45 . * 



39 



J 



40 



^ systematic difference between the resoprises of the 
two samples* perhaps it may be afleffect of th€ 
campaign. The reasonableness of that hypothesis* 

^ depends up^ the confidence which one has in the 
assumptions^at the population from which th^ 
saYnples were drawn did'not change over timp and 

' ,that no 'other events occurred in the community 
which might have accounted for the response ' 
' change. Thus, for example, in a survey of , business 
firms concerning their victimization by crime, if 
the^me elapsing bciween the first a^d second 
surveys i&j/:erfiong, the population of businesses 
available to be surveyed may have changed as some 
businessmen move out and others move in. Or if 
unejnployment rates change' from the time of the 
first to the second survey, crime rates may change 
qoite indepe/idently of any police activity and 
either obscure or enhance the apparent effects of a 
police program. The separate sample pretest- 
post-test design is obviously not ideal, but it may 
have some utility when elapsed time is brief and' 
when, luckily, there do not appear' to be an.y 
dramatic inter\<dning events which might have 
produced the apparent experimental effect. 

• Time senes designs. One additional design which ' 
may be useful to note is the time series, a research 
design which can be implenvent^d when one has an 
opportunity to make a series of baseline observa- 
tions prior to the introduction of some pro- 
grammed change,^and a subsequent series of com- , 
parable observations. For example, if a hospital 
emergency room wished to institute and test a new 
method of h-andlmg possible ffacture cases in 
order fo minimize unneces^y radiography, if 
records on radiographic procedures and positive 
and negative results for discovery of fractures 
were available b\ week for a period of a y^ar prior 
to the change and could be accamulated weekly 
. for a vear or sp following the change, there woiild 
proba6l\ be.adequate Qata for a time series analy- 
sis of data.'Anv change fromihe pre-experimental 
to the experimental period might well be attri- 
buted to the imterv^ntion. However, the inty^pre-* 
ration of findings is often not simple. To begin 
with the number of observations or data points 
needed on either^side of the intervention is size- 
able in most cases because of the^fluctuations 
which normally occur and have to be dealt with. 
Seasonal changes or other cyclic changes pose 
problenas, e.g., in a wintry area there might be 
many more cases in the winter^with possible 
changes in bas^ rates of genuine fractures, obvious 
fractures, pr whatever. Moreover, if the experi- 
mental change only has a gradual change because 
of being phased in or because of taking time to de- 
velop, the gradual change in the post-intervention 
perio^ may be difficult iq interpret as an effect of 
the chan|[e rather than as natUrally occurring 
change. One would aljo want to be assured that 
on(y the critical change occurred during the inter- 



vention period. Thus, for example, if not onlyihe 
method of processing fracture cases but tVe 
radiologist changed, the effects might be difficult 
to disentangle. 

It is also apparent that an experimental inter-* 
vention may. have a wide variety of effects, and 
those effects will differ in the case with which they 
may be detected. For instance, the following are 
but some\f the effects possible (see Fig. A): 

(a) The initial effect is small but cumulative, as 
might be the case for the effects of a fra?fi- 
ing program on income. The effects in the 
early years would be small but might well 
grow jp $ize over lime. 

(b) The initial effect is fairly marked^ but 
there is a fast return to original levels. An 
example plight be rhe effect of a refresher 
training program in^e schools, with per- 
SQ^neJ showing an immediate and perhaps 
substantial improvement in performance 
but followed by a quick loss and return to 
normal behavior. 

(c) There is an immediate, "discrere^change 
which is maintained over time, e.g., the in- 
stituting- of an improved communications 
system might have an immediate effect on 
rescue response time with little if any fur- 
ther change. 

(d) In a situation in which some^-B€(iavior is 
changing gradually over time, thAe^is an 
experimental intervention which Alightly 
displaces the level of the behavior being 
observed withput having any effect on rate 
of change. An example here might be the 
effect of som^ brief training program in- 
troduced in the context of gradually im- 
proyrng skill, such as ^ight occur if a 
groufJ" of £MT students wei:e shoWn> a 
couple of nonobvious handy tricks in the 
handling of'some items of equipment. 

The above are only some of the possibilities; 
there are many more. Detection of changes in a 
time series is not an easy task, and the statistical 
tools needed for that detection are still in the 
process of being worked out (cf.. Glass, Willson, 8c 
Got^pian, 1975). Interpretation of a time series can ' 
often be improved if multiple time sefies can be pre- 
pared, e.g., if a comparison group is available with 
the. experimental intervention introduced at a dif- 
ferent point or if a comparison group never ex- 
posed to the experimental intervention can be 
studied. Such comparison groups can help to rule 
out the possibilities that factors extraneous to the 
experiment, such as broader community changes, 
mass media campaigns, maturational processes or 
whatever mi^ht have been responsible for the ob- 
served changes. * ^ 

The requirement of rather long pre- and 
post-exptri mental observation series represents a 
fairly stringent limitation on the usefulness of time 



ERIC 



•46 



/ . - ■ 

Fig. A. Different Time Series Outcomes 





series designs since ii is not often the case that u is 
possible to plan for and collect data weeklv for up 
to a \^ar prior to and subsequent to an expeririien- 
tal inter\eniion. However, there are man\ cases in 
%vhich ongoing. records ma\ be exploited in-order 
to obta/n baseline data so that fht experimental in- 
tervention can be implemented immediately, the 
limitation being that no change in recording pro- 
cedures can have occurred or be tolerated from 
the beginning of the baseline period to the end of 
the experiment. If that requirement can be met 
and if the records contain information satisfactorv 
for judging the success of the program, the time 
series design can be quite useful and often a rea- 
sonable substitute for a true^iexpe/iment. 

Accepting the null hypothffsis. 

It is in the nature of the ^valuation of ex^ri- 
mental treatments and interventions that one ver\ 
c^ften wishes to be afble to demonstrate that the null 
hypothesis is tenable, i e.» that it is reasonable to 
believe that two treatments do not differ in out- 
come. That is particularly likely to be the case 
when one wishes iri^how that a new and simpler or 
less expensive progran? produces results as go^d as 
those produced oy an established program. It is 
not necessary to demonstrate that the n;ew treat- 
ment is bfttrr than the old one, 6nly that it^ is 
equally as good, For example, paramedical per- 
sonnel only need^ to be able to handle medical ' 
problem^s well as more expensive physicians; a 
new and simpler suture need only be as good as 
established pWcedure; a six week training 
pro-am need onlyW as good as a ten week pro- 
gram. In traditional science there Has been a pre- 
dominant concern with mistakenly accepting a 
^^'^othesis which will later prove to have been 



wrong, because traditional science proce«ls, and 
can afford to proceed, in a gradual, orderly man- 
ner, with findings being checked regularly^ by- 
other investigators. However, in evaluating social 
programs it he equally as harmful mistakenly 
to conclude that a program is ineffective as to con- 
clude mistakenly that it is effective. Once ^ pro- 
gram is shown, however erroneously, to be ineffec* 
tive? it rtkiy be abancfoned and nevjer<-ifd again. 

Ti^te'^are serious problen\s involved in at- 
tempting to show that tw^o prograhis or treatments 
are eq^ial in thieir effects. In theJlWt place, strictly 
speaking it is implfobable thal^^y two treatments 
are exactly equal. Consequently, the likelihood of 
determining that they are unequal wi|l depend on. 
the precision with which the experiment is done 
and the number of cases studied, However, the 
more carefully an experiment is done and the 
larger the scope of the study, the more likely it is 
that a difference will be found mjt that the differ- 
ence will be of trivial practical *irppoTtance. Con- 
versely, the smsi^er and more carelessly done an 
experiment is, the greater the probaBility that the 
conclusion that two treatments do not differ will 
.be reached. The difficulty is that tht conclusion 
that there is a difference can usually be reached 
witlva fair degree of certainty: the conclusion that 
there is no difference is almost always more weakl\ 
supportable, 



Still, investigators, anp'the consumers who use 
their work, do often arrive at acceptance of the 
irkelihood that there is no practical difference be- 
tween two programs or treatments. The research 
outcomes associated with that sort of a conclusion 

30 be better understood, but several factors 
o be involved in acceptance of the **no dif- 
e*' conclusion. Firsf. acc<pptance of the null 
hypothesis is facilitated by fairly large scale, care- 
fully conducted studies. If-one wished to be able to 
conclude that ^paramedical personnel can haiylle 
certain emergency procedures as well as physi- 
cians, the study should not be carried out on a 
small number o'l^)aramedics apd physicians, nor^ 
should it be undertaken wfffiout careful attention 
to measurement problems, definition of cases, etc. 
Second, general acceptance of the null hypothesis 
is more* likely if the conclusion of no difference 
has a strong, logical inferential base, It is easier to 
believe t^ two programs are equaAf there is no 
powjcrfuh reasons to believe that they should be 
different. One might well believe that general sur- 
geons would do equally as well as specialists in 
ciarrying oul routine appendectomies; it would be 
difficult to believe that they would do" as welLas. 
specialists in carrying out neurosurgery. Third, 
the null hypothesis is rendered more ^ccceptahle if 
a large number of widely varying pleasures show- 
ing no difference are obtained. If onl^one or two 
variables are studied, it is easy for tfae3olibter to 
insist that'a mo^c assiduous search for differencies 

I V 



wbuld have uncovered them. In the Kansas Cityt 
police patBoI experiment, fof example, it was cofi- 
cluded that types of patrol do not differ in their 
effects. Thai conclusion i« sufficient to warrant 
changes in patrol strategics to capitalize on oppor-< 
tunitiesjpr redeployment of personnel. The per- 
suasive feature of the results is that many different- 
possible outconae measures were examinjed, and 
thert was no consistent'pattern obvious in the few 
differences that were found. 

It is not easy*to gain acceptance of the null 

% hypothesis, and it can never be proven, bm it is not 
imf>ossible to establish it as a reasonable conclusion 
when that seems desirable and consistent with the 

s . ffndings. \ 



Jn favor of strong treatments. - 

If one has a program that x)ne believes'to be 
effective and if one wishes tcv establish that effe<^- 
tiveness by an experimental trial, there is one rec- 
ommendation which, above all others, is likely to 
maximize the chance^ of getting the desired out- 
come. That recommeijdation is to devise and iin- 
plement the treatment in a strong form. ProbatUy 
as much as any other factor it is'the weakness oi, 
experimental%*eatments that Jorces us to the con- 

* elusion that they are of no value. For example, it is 
nearly poindess to attempt to evaluate a training 
program that is {fcorly planned, carried out by in- 
expert instructors, and that is ill attended 
trainees. It is true. that those might be cjj^racteris- 
tics of eventual implemeoivations of the program 
when it is^actually put into practice, but ordinarily 
we want to know whether a training program will 

^ be effectivej^hen it is done right. After that has 
beeTj established, it may then be' worth determin- 
ing whether inexpeM^instructors can carry' out the 
training, etc. * 

If a treatment is delivered in a strong* optimal 
form, then conclusions are likely tor be fairly clear 
CHt. The program will euher produce sizeable ef- , 
fects which will be evident in spite of design ai§d 
measurement problems, usually without the need 

. for any fancy statistics, or it will be clear that the 
treatment does not do very much. If it does not, 
work well in its strongest form, it will almost cer- 
tainly not do anything at all under field conditions. 

Even in simple pre-experimental designs such 
as tljDse involving a pretest, a treatment, and a 
I>ast-tH^^ven to one group only, i.e.. O x O, a • 
striking change, especially if it is consistent across 
all the cases, will often be quite persuasive. If al- 
most no trainees can do CPR properly before a 
training program ai^d almost all of them can do it 
very well afterwards, no control group would be 
needed. However, if the difference is not great, 
i.e.. the treatment does, not have a strong effect, 
the possibility that the pre-test alone might have 
produced the final difference might not be unre^ 
sonabie. Or if some new burn treatment seems to 




work weVcTh just about all crises on which it is 
tried, a case for its effectiveness may be made even 
despite the absence of a control group. However, 
' the problem is, to dev^lofJ a^yong treatment an 
t% be able to deliverJt consistently. Unless 
quite confident of being able to meet those 
criter^, it is much better to rely on inore powerfiil 
expeiiKnentai designs with comparison groups. 

* • • 

Feasibility of Experimentation in Social Action 
Programs 

How feasible and useful are even such quasi* 
experimental designs in the context of social action 
programs? Boruch (1974) has (tocumented more 
lhan 200 experiments ^sliich illustrate_the variety 
of social programs which have been subjected to 
experimental field test. A number of interesting 
approaches have been used in these experiments 
in order to obtain randomized assignment. 
Campbell (1969) argues that randomization might 
be very reasonable to use in the social setting. The 
randomization unit might be persons, families, 
precincts, or large administrative units. Where re- 
sources are scarce and are not available to alK ran- 
domization is pertiaps-ihe most democratic way of 
making them available or testing them in sodal 
programs. The necessity of introducing ptkrt proj- 
ects and staged inngvations also peniiits the use of 
random assignments as the best way of assuring 
equality and fairness to all social groups 

Despite all this, it is often the case (hat social 
action programs are unable to find applVTpriate 
random groups to serve as controls in experi- 
ments. In'such situations, it would be appropriate, 
in a quasi-experimental situation, to find rea- 
sonably compc^raUe and equal comparison groups. 
There are obvious problems with this, for service «^ 
must be denied to certain sectors of the consti- 
tuency, which results in the problem of most 
policy-makers wanting to assign people to treat^ 
ment on the basis of their professional or political 
knowledge and experience. Such expediency de- 
stroys.randomness or comparability and makes for 
difficult generalizations. Of equal impK)rtance is 
the problem of obuining suitable controls and the 
social problem of dealing with angry, aggrieved, 
and distraught subjects who have been treated as 
controls with placebo treatments. Social action 
programs tend to, hold out high^ expectations and 
considerable political commitments and biases due 
to the preconceptions and- hfahetUconvictions on 
the part of their proposals. In such situations, ad- 
ministrators often find themselves trapped in ad- 
vance in -the need to prove" the efficacy of the re- 
form that has to J)c .evaluated without being able' to 
conduct an honest experiment to find_out its irm 
value (cfj Campbell, 1969): Such poIttTcaf pressures 
need to be handled with honeity and forthright- 
ness. It would be wrong to use biased analysis in 




order to demonitrafe the usefulness 'of a refoj|p 
that has been- implemented. 

Perhaps the most difficult fact for adminis- 
trators and policy-makers to accept is that single 
-experiments rarely prove or disprove the utility of 
a particular approach*. The essence of good re- 
search design and statistical analysis is to be able to 
demonstrate that one and only one known variable 
could reasonably have produced the observed out- 
come, but anjibne study is likely to be so narrow or 
specific in the pi^ogram tested, or population 
studied, or outcome observed that any final< Un- 
ocal conclusions would almost always be un- 
warranted. That is a state of affairs that can prove 
ver\ frustrating even to a program evaluator, let 
alone to an administrator who must make a deci- 
sion. Scientists generally hope that a cumulative 
model might be used in social action experiments 
in order to demonstrate their long-term utility. 
The recent experience of evaluation of social ac- 
tion prt>griwns has demonstrated a lack of ccnn- 
tcomcs from different programs, 
to insure that one prograiji 
uuliz^ the experience and 
lus one. The very nature of 
large-scale mvesih<iei^s in society requires thai lit- 
tle overlap occur pariitiilarly where redundant 
and not so Mseful approaches have previously been 
tried. Thus, later programs tend to be essenlialU 
new and thereby give the impjpession that previous 
approaches have been condemned by implicatioh 
The fact is^ that iutle information lends to be 
gathered about previousl\ tried approaches. Thus, 
' the process of successive approximation is 
hamperecj. ^ 

A note about correlational atudies. 

There is probabU no methodological and elp/s- 
temologjcal warning more often encountered than 
that,**correlatioTi does not equal causation.'* There 
IS probably also no* warning more needed. The 
medical field has many areas and problems that 
^are reJifcitrant to good experimental design, 
whetTier ^or practical or ethical rea^ohs, and in 
those areas {he temptation at lej>l^ collect Corre- 
"lational data is seemingly irn^^islable. Many Qf the 
correlations are fascinatirfg enough, but few of 
them provide any basis pn w*hich to make policy, 
and 'not a great many nfore provide jyy basis for 
improvecT understanding of the basic processes 
which are at work in the field. This is not to insist 
; tiiat correlational data sht>uld never be collected, 
nor that sitch data arc invariabl^w^orthless. Rather 
it is to serve as a reiteration of the warning and an 
encouragement to try to think through in advance 
the implications of a Mudy vnvolving correlational 
data. 

Perhaps it is worth a line or two to explain that 
by correlation. is meant the observation of covaria- 
. tion. of the PcJatedness of two or more variables. A 

ERJC 



\ 



correlation may involve observation of two vari- 
ables as they change over time, or it may involve 
the values of one variable as a function of the val- 
ues of another^ For e^tample,' weight and bjood 
pressiire^/nay be measured and correlated Tn a 
single individual ovef time, let us^say by obtaining 
measures of both on a w'eekly basis. Alternativefy 
weighfand blood pressure may be measured at the 
same time in a nujnber of different irrdividuals. 
Cdr/elations may be pK)sitive, meaning that a large 
value on one is associated with a large value on the 
other, with medium and small values being simi- 
larly as^cifated. Blood pressure and weight are 
likely^ (Trt)e correlated p>ositively in a la^»ge^sample 
of persons. Correlations may also be negative, 
meaning that a large value on one is as2{;fciated 
with a small value on the other and vice versa. 
Cbrrelations between age and health status 
likely to be negative, i.e., older persons have worse 
health. Correlations may also be essentially zero, 
i.e., indicating no relationship. There is probably 
no correlation between* height and (Kcurrence of 
myocardial infraction in adult males. Correlations' 
ma\ var\ from rather large, indicating strong 
relationships to near zero, indicating weak 
relalipnships. 

The point of the above is to indicate that cor- 
relations only indicate that two sets of observations 
are related in the sense that the values of one are 
some function of the values of the other. There is 
no mdication from the correlation itself uhy the re- 
lationship exists. The a^sumjjfion may or may not 
be correct or even reasonable. There is usually no 
wa\ to be ver\^sure without a great deal of addi- 
tional information, and even (hen. as the 
smoking-lung cancer debate informs tis, certainly 
is limited. 

The problems with interpreting correlations 
can, perhaps, best be illgstraled with some 
examples: 

-^II has been found that the more often a sur- 
geon performs a given procedure, the better 
the results he gefs. Should we then encour- 
age surgeons who do^iot operate very often 
to c}o more surgery? Or is it possible that the 
better a surgeon is, the mor^ referrals he 
gets? 

— It has been found ihat teaching Jyj^pitals 
produce belter outcomes for a wjO^ variety 
of medical and surgical cases. Should we 
then encourage all hospitals to institute 
leaching programs. Bi^g^r hospitals also gel 
belter resuks. Should smaller hospitals add 
beds? 

— One Mtidy reported that the faster the travel 
lime of a rescue squad from the scene of^ the - 
emergency to the hospital, ihfJf'^^wer the 
f^robability of survival of tW patient/Should 
e#(ergenty vehicles then travel slower? Or 



43 



N 



isn't it possible that the more desperate the 
. case, the faster the driver will go? 
—The Statistical Bulletin of Metropolitan Life 
Insurance Co. has reported that among 
. ' ijiajor league baseball players third basemen 
have had the lowest mortality ratios, and 
pitchers and first basemen have had the 
highest mortality ratios. Is there a clue there 
I for fhe parents of Little Leaguers? 
The above examples were deliberately chosen 
as somewhat .extreme, but they do illustrate the 
hazards of attempting to interpret correlational 
data. MoT£ sifbtle examples could as easily have 
been chosen, a representative one being the obser- 
vation that the more years of experience a police- , 
man has, the more cynical he is. Does police work ' 
breed cynicism, or do only the cynical survive in 
the police force? Experienced hang glider pilots 
have more fatalities than the inexperienced. They 
probably also fly more and take moir risks. Teen^ 
age boys have more auto accidents than girls? 
More reckless? Less skilled? Or is it because they 
drive more mileg^^ 

It is true tlrat more powerful statistical tech^ 
niques for dealing- with correlational data are cur* 
rently bemg develbped and studied, but their use 
i^ as yet of questiopaWe value. Our best judgment 
at this time is to l^^d trying to base conclusions 
abom causal relationships on the basis of mere as-, 
s<Kiation between variables. 



References 



Campbell. D T. & Stanley, J.C. Experimental and 
quasi-expeTimental designs for research. Chicago: 
•Rand McNallv. 1963. 

Cochran, W.G Research technique^m the^tudy of 
human beings. Milbank .Memorial Fund Quarterly, 
1955. 33. 121-136 

Cook, t.D. & Campbell, l5.T. The design and 
conduct of quasi-experiments and true experi-^ 
ments in field settings. In M.D. Dunnette (Ed.) 
Handbook of mdustnal and organizational psychology, 
Chicago: Rand .McNally, 1976, pp 223-326. ^ ' - 

Gilbert, J.P., Light, R.J., &^Mosteller, F. Assessing 
social innovations: an empirical base for policy. In 
C.A. Bennett — A. A. Lumsdarne (Eds.) Evaluation 
and expenmenli^ome critical issues irk assessing social 
programs. New York: Academic Press, 1975, pp. 
39-193. 

Class, G.V., Willson, V.L., & Gottman, J.M. Design 
and analysis of time series experiments. Bouldct, Colo.: 
Colorado Associated University Press, 1975. 

Kelling, G.L., Pate, T., Dieckman, D., & Brown, 
C.E. The KQnsas City preventive patrol experiment: a 
technical report. Washington, D.C.: The Police 
foundation, 1976. 

Nay. J.N. et al. Benefiis and costs of manpower' train- 
ing programs: a ^synthesis of previous studtis. Washing- 
ton, D.C.: The Urban Institute, 1973. 



f N^ 



of 



Peterson, O.L. An Analytical study qF N 
Carolina general practice, 1953-1954. Jo 
Medical Education, 1956,57, 1-165. 

y 

Footnotes ; 

1. The adthor is indebted to Ayres D'Costa for as- 
sistance and advice in preparing this paper. 

2. A recently completed^fc| not yet published 
Police Foundation'study (J^pd.out in San Diego 
indicates very strongly that one-officer cars are 
safer and more efficient than two-officer cars. 



50 



Social Attitudes and , 
Program Evaiuation ' 

Russell D. Clark, III 
Associated Professor of Psychology 
Florida Stafe University 
Tallahassee, Florida 



There is probably no one approach to evaluating programs that ts more often and more xvidely used than attitude measy^ctment. ^ 49 
Attitudes of trainees are assessed, public attitudes are tapped, attitudes oj administrators are inquired after, and so on. Yet, cu this 
paper makes clear, there are serioits limitations to the usefulness of attitude measures, and ^valuators should probably never rely 
solely on those measures. . ^ ^ 



The concept of attitude has been regarded as - 
the most distinctive and indisp>ensable concept in 
Ameri«a-n Social Psychology (Allport, 1935). In 
fact, iljll today the most widely used single term in 
al! th^Rehavioral sciences (Berkowitz, 1972). The 
original impetus for the study of attitudes was, and • 
is, that they are believed to have something to do 
with l|pw people act or behave. For example, the 
statement "the actions of the individual are gov- 
erned to^ large extent b) his attitudes" explicitly 
assumes ttat what people say is a good indication 
of what tHey wiH do. In theory by the use of well- 
constiuaed questions and answers to them it is 
possible to obtain a great deal of information 
about an individual's paft actions, his or her cur- 
rent beHefs and even intended future actions in a 
relatively short period of time, and then use this 
information to predict what the individual will in 
fact do in a p^Hkular situation. 

Guided by these assumptions, social 
psychologists have gone about investigating the at- 
titudes of a terge part of the world's population. 
For example, attitudes about politics, race, war, 
tno nc^ work, sex, religion, communism, health, 
andso forth are constantly being reported m 
M)urce5 ranging from scholarly articles and books 
tVthtify' newspapers. This information is not only, 
made available to nearly everyone, but it unques- 
tionably affects our Ifves in important ways. Politi- 
cians often rh iiigr thrir vir ( ii least as verbally 
expr^s^ed) to con/orm to the mood of the people 
as revealed by opinion polls. It is ndi even un- 
common to Find the latest returns in politicians' 
pockets. Economists study, consumer buying inten- 
tions, -and businesses spend millions of dollars try- 
ing to find out the public's reaction before either 
naming a new product or finding out what is the 
best way of preserving the product so that many 
peopk will actually buy it. In fact, the concern of 
km)wing what people's attitudes are is^so pervasive 



that it runs throughput the personal, private, and 
public sector of our culture. The importance of at- 
titudes as a concept is further reflected by the fact 
that we often change our own attitudes in response 
to information about attitudes of others.. 

So, the social psychologist and the Uyman iare 
alike in their interep in attitudes because they are 
thouglit to provide a basis for predicting ovejt be- 
haviors. It is further assumed that attitudes can ac- 
curately be measured. \ > . 

The organixatioJVof this paper is as follows: 

(1) a consideration of what is meant by attitudes; 

(2) a discussion oy how social psy(;hologisls go 
about me*S*^^in5^titudes; and (3) a careful look 
at the fundamental assumption underlying the 
study of attitudes. * 

« 

What Is meant by attltudaa? 

Nowhere is there more disagreement in social 
psychology than in the definition of an attitude, In 
1939 there vVere 30 separate definitions in use. 
- Today there are probably more than 100. Rath^ 
than dwell on the numerous different definitions 
of attitude, I will define attitude and its chdrac- 
' teristics fh a way that most social psychologists 
.would agree with. By attitude is meant a disposi- 
tion to respond to some social object in a negative, 
neutral, of positive manner, i.e., one is set to re- 
spond for or against somrthing. That something 
may be a system^of beliefs, political^ party, au- 
tomobile, certain other persons, an institution, . 
group, valine or ideal, or one's own body. Attitudes 
hive the fpllowiftg characteristics: 

1. Consistency. The n^ost basic and fundamen- 
tal evidence for attitudes is a pattern of consistency V 
in responses to soihe social 'object. Let us see what 
is iv^Hcd by consistency? Suppose one day during 
lunchyou observe a man being rather arrogantly 
^ rude to his waiter. Why? Well, perhaps the man is 
in a bad mood, perhaps he just lost his job or loved 



ERIC 



4* 



51 . 



one, or perhaps the waiter just delivered cold soup 
. and *a warm martini. Now suppose further that 
during the next ten days you eat lunch in this res- 
taurant and eyeryday you notice the same ru^^ 
behavior toward whoever is serving, the man. You 
might conclude that the miwj feels superior to 
waiters., If so, what you have done is infer an at- 
titude from "consistent behaviors (rudeneiS^b^to 
some social object (waiter). However, if you were 
able to observe the same mian in different settings 
and found that he displayed this same consistency 
of rudeness toward a wide variety of people, you 
might conclude that he feels superior to most 
people. That is what is. meant by referring to^n 
attitude a3 a pattern of consistency in responses to 
some sociafl object. 

, 2. Acquired, Attitudes are not innate; the^ are 
acquired or learned. Attitudes are not transmitted 
^through the gene$. Infants do not arrive in the 
world with preferences for a^particular social, 

. political, economical, or religious orientation; 
rather an individual's dispositions toward social 
objects is a result of the individual's prior experi- 
ences. Whether one feels positive, indifferent, or 
negative toward a particular' social object depends 
upon prior experiences with that object. For 
example, there is a tendency to like those social ob- 
jects which have led to pleasant consequences in 
the past and to dislike those social objects which 
have led to unpleasant^ consequences. Plieasant or 
unpleasant consequences may occur as a direct re^ 
suit of interacting with a social object or they may 
^Sur vicariously as a result of observing others 
verbally expressing pleasure or ^discomfort when 
engaged in interaction with ap object. Attitudes, 
can also be taught directly, e.g., as when parents 
teach their children to look favorably upon some 
system or religious beliefs. It should be apparent 
that an implicit assumption involved in viewing at- 
titudes as being a function of learning is that xhe 
formation of attitudes is largely a result of the en- 
vironment in which the person lives. More specif- 
ically, persons who have lived together in a ^ 
particular environment will hold attitudes more 
similar to each other than will persons raised in 
different environments. Thus, on the basis of 
being able to identify the political climate x^f a na- 
tion, state^ or different locales within a given state, 

'it is possible to predict with a fair degree of accu- 
racy whether a conservative, moderate, liberal, 
candidate will be elected to office. Similarly, on|^i - 
can predict how individuals will respond 
numerous social issues. 

3. Stability. Once formed, attitudes are suble 
and endure beyond the immediate tim^ and place. 
Attitudes are usually thought of |b relatively en- 
during. They are not necessarily permanent, but 
they are regarded as fairly stable from one day to 
the next or until some reason for change occurs. ' 
Examples of occasions for change would be when 



ERIC 



an individual is no longer rewarded for expressing 
a certain attitude, when an individual encounters 
new-experiences which ar^ inconsistent with prior 
attitudes, or when an individual is exposed to new' 
information concerning the attitudinal object. 

4. ' Structure. Attitudes have a conceptual or 
cognitive, structure. By conceptual or cognitive 
structure is meant that an individual has beliefs or 
opinions about attitudinal objects, e.g., women are 
more emotional thai^^en, examinations test only 
a small part of what we know, individuals on wel- 
fare are lazy, politicians tend to be dishonest, doc- 
tors care more about money than the welfare of 
the patient. Our beliefs and opinions tend Xp be 
consistent with our affective dispositions toward 
attitudinal objects. l( one is favorably disposed to- 
ward a particular attitudinal object, beliefs regard- 
ing that object are likely to be positive; if one is 
unfavorably disposed tfewards the same object, be- 

• liefs tend to be negative. A person is scarcely 
likely, for example, to have ^ Very positive^ attitude 
toward a certain hospital eyiergency room and also 
believe that the physicians there- are incompetent. 
Similarly, having positive feeli^igs toward a given' 
object will usually lead to an expectation of conse- 
quences, whereas negative feeljngs toward the 
same object lead to expectations ofjicg^tive conse- 
quences. For example, a person who is fyejudiced 
against blacks would .be more likely than other 
per^^nj^^Ui believe that allowing blacks to move 
into white h«i^borboods wolild lower jDroperty 
values, lower the quality of education, and make 
th^ atmosphere of the community less pleasant. 

5. Intensity and extremity. Attitudes vary in in- 
tensity and extremity. Intensity refers essentially 
to the strength with which an attitude is experi- 
enced and extremity refers to degree of favorabil- 
ity or unfavorability an individual appears to have 
toward the attixudinal object. Attitudes vary from 
low to high intensity and from low to high extrem- 
ity. The pattern of consistency in responses to a 
given social object should be greatest when the in- 
tensity and the extremity of feelings toward the* 
object are strong. As intensity and extremity de- 
crease a person is likely to be less consistent in his 
resj>onses to the ofiject. Most people, for example, 
[Probably have generally favorable attitudes toward 
emergency rescue services ih. their communities, 
but since direct experience with those services is 
limited, most public attitudes are probably rather 
poorly formed and are neither intensely held nOr 
extreme in ppsuion. Thus, one could expect a fair' 
amount of incpnsistency in such attitudes, e.g., be- 
lieving that ambulance personnel are generally 
competent but that they rfiay discriminate on the 
basis of race or social class. Weakly ^eld attitudes 
are also more susceptible to change so that a single 
unfavorable event involving an ambulance com- 
pany might have a fairly extensive effect on com- 
munity attitudes. 

52 



Ways of measuring attitudes. ^ 

Before it. is possible* to study the formation of 
attitudes or attitude change, and certainly before 
* an individual's behavior can be predicted, it' is 
necessary to be a-ble accurately to measure at- 
titudes. Not surprisingly, then, social psychologists 
hate spent a great deal of time, effort, and money 
in formulating and developing mea^res of at- . 
titudes. The most common approaches to aititucje 
I me^luremejit^re self reports, indirect methods^ 
ph ysialogical measures, and observational 
methlds. ^ 

Self reports. Without ^question the most com- 
mon way of measurmg attitudes is simply to ask 

■ inSividuals what their attitudes are. The typical 
procedure involves asking individuals to complete 
an aitiiudinal questionnaire which contains 
numerous positive and negative statements regard- 
ing aitiiudinal objects. T4ve"-subject is asked to 
agree or disagree with each item or, preferably, to 
indicate how much he agrees or disagrees with 
each Item, e.g . strongly agree, "agree, indifferent, 
disagree strongly disagree. 'The uncjerlying as- 

, sumption m the latter case is that an . individual 
who agrees is less favorably disposed toward the 
object than an mdividual who strongly agrees. ^ 
Similarly, an individual wbo merely disagrees is 

Sumed to be less hegaiive toward the object 
an individual, who reports strong disagree- 
ment. After the questionnaire is completed the in- . 
vestigator merely sums the scale values ^nd arrives 
at an overafl mdex expressing favorabiliiy or un- 
favorability toward the atiiiudinal issue Thus, 
based upon self reports obtained from individuals, 
social ^\chologisis attempt to predict how a given 
person will behave when confronted with a par- 
ticular social object. 

In developing aiiiiudinal questionnaires -the 
investigator assumes or determines, that the indi- 
vidual iteYns are either positive or negative con- 
cerning a social object and that if individuals agree 
(disagree) with ont particular positive item they 
will tend to agree (disagree) with all other positive 
Items. In. general, these assumptions are correct. 
Persons judging items with respect to a particular 
issue can agree on which items favor the issue and 
which do not. Moreover, research on attitudes has 
shown, that if an individuaNs favorable toward one 
pro Item, he or she tends to be favorable toward 
other pro items, and the converse is true for con 
itefns. in short, psychologists have been able to de- 
, velop questionnaires incorporating both pro and • 
con items on a ^iven at^itudinal issue, and there is 
a tendency for individuals to be consistent in their 
f agreement or disagreement with the individual 
Items. ' ' 



In addition, the self'report methods of 
measuring ntake ?wo additional, key assumptions, 
^irst, it is assumed that a person knows hpw she or^ 



he feels aboift particular social object.* Second, 
the person m«st ba assumed to respond openly 
and honestly^ to the itemiS. These two assumptions 
^re simpleyand intuitively appealing. In order to 
predict acjcuraiely a persons' behavior, the person 
must kndw what his altitude is and must honestly 
report n. To the extent that these two assumptions 
are nofmti, predictions will be poor.f 

ifprtunately, the validity of the last two as-^ 
sun^fptions has plagued social psychologists from 
ihf beginning. People apparently do not always 
ki^pw how they feel about social objec;is, and more 
•importantly, even if they do kpow, there are many 
reasons why individuals eitR^ will not reveal ihgir 
attitudes, or, in faci^ will give deliberately mislead- 
ing answers. In our culture responses to attitude 
questionnaires are affected by a positivity effect. 
and social desirability. By pAiiiviiy effect is meant 
a gerteral tendency, everything else being ecjual, to 
say nice things rather than negative things, about 
the other people. In most experiments which ha^'e 
been designed to affect the liking or disliking of 
one person for another, the liWing is stronger than 
the disliking Also, there is a strong tendency for 
individuals to give socially desirable answers. That 
IS, when an investigator is trying to gef a measure 
of a socially disapproved attitude, there is a strong 
. tendency for respondents to give socially more ac- 
Kreptable responses For example, in many seg- 
ments of our society it is not socially acceptable to 
express negative altitudes toward blacks. 
Mexican-Americans, Italians, women, etc. Yet, 
man) Americans clearly do hav^ negative attitudes 
toward one or more of these, groups, so that when 
confronted with a statement sucli as ."I dislLke^^^ 
being around blacks," "I thir^k blacks are inferi(>r^ 
or "Women should stay in the home," etc., they 
will tend (o give neutral or slightly positive re- 
sponses even when in fact .their attitudes are 
strongly a^gative Here is the main problem. 
Whereas a social psychologist wants answers to re- 
fleet true feelings, respondents are^'usually con- 
cerned with what others will think of them. 

The social psychologist's prpblem, is that he 
seldojn really knows wKeiher the subjects' re- 
sponses are genuine or a result of social desirabil- 
ity. Giving false responses to make the'mselves look 
good is most likely to occur when respondents 
know that some other person will become aware of 
what fheir ^ttitudes^afe. To alleviaje this problem, 
social psychologists tend administer their ques- 
tionnaires in large, groups in which h is virtuallv 
impossible fof the subjects' responses to be xien- ^ 
tified- However, even under these .circumstances ' - 
there is reasoo to believe that subjects still tend to 
' respond on^ the hasfs of what is socially desirable. 
For example,, one of my col^eaguelt. Dr. J. 
Brigham, has been interested - for the last eight 
years in whites attitudes toward blacks. He has had ^ 
fo give up severaF research projects because he 



48 



cannot find very many "prejudiced" individuals in 
Tallahassee; he cannot find subje% who will give 
a si\( 4-eport indicating "I dislike blacks." Oft 
^ course^ ^his couW m?an that there are no preju- 
^ diced individuals in Tallahasseie, although given 
the^hiring and residential practices of our city, we 
are dubious in -the extreme of that proposition. A 
much more lilcely explanation is th^t many persoos^ ^ 
are responding more on the basis of what they 
know society wishes them to say than on the basis ' 
of their own true feelings. - ^ 

In spite, of these limitations, seff reports are 
the most popular and fre^quent way of measuring 
attitudes, a fact that will continue to be true 'be- 
cause compared to other approaches, self report 
measures are easy ta develop and administer,' and 
they are economically. feasible. At the same time 
we must constantly keep in mind that people are 
not always in'touch with their dispositions, ^nd, 
even when they are, they will not always give com- 
pletely ti^thful responses, particularly when they 
are concerned vyitlTbeing evaluated. We are still 
looking for a satisfactory soltfton to these prob- 
lems. 

Indirect fnethods. The indirect approach to 
n>easuring attitudes involves exposing an indit 
vidual to a relatively unstructured or* ambiguous 
stimulus situatioiT^A person^s responses to a prop- 
erly chosen ambiguous stimulus are assumed to re- 
flect his or her attitudes. Foi^ example, Haire, 
(1950) presented the following shopping list made 
out by a hypothetical woman to a'sample of 
housewives: 



I*- 



ERIC 



VA lbs. of hamburger * 

2 loaves of Wonder bread 

bunch of carrots ^ 

1 can Rumford^s baking powder . 
Nescafe instant coffee 

2 cans Delmonte peaphes 
5 lbs. potatoes 

The other half of the sample were presented with 
the same list except that "1 lbs. Maxwell House 
coffee (drip^indr was substituted for Nescafe, 
Each respondent was asked to look over ^he shop- 
ping list and then to write a brief description of 
^ personality or character of the* woman whp had 
made out the list. The difference;? between the de- 
scriptions of the hypothetical woman who bought 
J^Tescafe ^*''con?^ared to the one who bought Max- 
well House coff« were rather striking. Approxi- 
mately half of the wom^ir who read th^ list con- 
taining- tKe instant coffee described its 6uyer as 
lazySand failing to plan her household purchases 
well; the woman who bought the drip ground cof- 
fee was rarely described in these terms. In addi- 
tion, the woman who purchased the instant (joffee 
was more often seen as a spendthrift, and a poor 
wife. Moreover, a check of the pantries of the re- 
sp#hdents showed that most, of .the women who 
described the. buyer of the instant coffee in un- 



favorable terms ^id-not adKially havelnsj;^ cof- 
fee on their^shelves, whereas those who diB not de- 
scribe her unfavorably were much more likely to 
have instant coffee. In short, it seen^ed that' in- 
terpretation of the decision to buy instant coffee 
^s influenced at least as mucl by attitudes alSfUt 
Wiat constitutes good housekeeping as_by .reaction 
to the flavor-of instant coffee. These attitude's 
might not easily have been elicited by a direct ap- 
proach. ^ ' • " » 
Other investigators have* used sentence- 
completion tasks 3s indirect measures of attitudes. 
Kerr (1943) studied the national stereotypes held 
by the English people by presenting iftdfviduals 
witbahe following*sentences to complete:''^ 

The thing I do admire America for is. . , . 
The trouble with"America*is. . . 
When I thinlt of the Russians, I think of.\ . 
4f the British and Soviet armies fight side by 

side they. . . * ^ 
If you invite an AhieHcan to yov»r home be 
may. . . , ^ 
^ Burwen,.Campbe11, and KIdd (1956) employed 
^n incomplete sentence test as one of a number of 
measures of attitudes toward superiors and subor- 
dinates in an Air Fore* population, with sentence 
Mparts such as: , . 

V He never felt <?omfortable in the presence 
of. . . 

Whenever he saw his superioi coming he. . . 
The assumption underlying sentence. conigl^- 
tion tasks is that thf waf' an individual completes 
the, sentences is a reflectjon of his attitude.^n the 
two eltamples' above^T sabje^ts fai'orable toward 
^ America «hd/or Russia woul<} be more likely to 
complete the sentences jn favofSble ways than sub- 
jects who have unfavorable attitudes. Likewise the 
^ completion of ^he statements concerning^^eriors 
and subordinates vyould be completed in wayV 
which are consistent vvith^the individual's attitude. 
In both studies ^pd^above'tlje resuks supported 
this assuraption.^^ 

Stiir anothejc indirect approajch is to present 
individuals with pictures of other people and ask 
t^em to respond to what is presumably ha[!>pening • 
in the picture. For exampje, in a study of attitudes 
toward physicians one might present a series>^f 
pictures*porti|iying physicians*engaged in a variety 
of activities. Subjects fnight be asked to describe 
the se*tting, the activities, and a probable outcome, 
or they might be asked lo provide dialogue such afS 
the probable response of a patient to a physician 
who is saying, *i cj^\ help you if you don't follow 
my ol-ders.** Again, it is assumed that the response 
of the^ sybjecyt^ the task reflects the subject's at- ^ 
;titudes: - 

As with direct approaches and the approaches 
discussed below, there are botlv advantages and 
* disadvantages to the use of indirect ways of 
measuring attitudes. The advantage^ claimed for 



54\ 



* . \ndirect ilpproaches' are as follows: (I) they en-' 
courage in .respondents a stafte of freedom and 
^ spontaneity of^xpreJsion; (?) they can tap a per- 
sorf's attitude^on-. issued thfat they cannot. easily 
evaiuate,or describe^their motivations o\ feelings* 

. 1^) th^y are pSrticilllrlvAsseful >S1[ien they are 
employed on topfcs on/which respondents' may 
hffsitate%o express thefr opinions directlyvfor fea!* 
o£ disapproval by the inves*tigator (a magor prob- 

• lem with direct approaches); (4) they jnay be the 
only ijieans availat?le» e.g.» when respondents are 
likely to^consider direct questions as unwarranted 
invasion* of privacy or to find them threatening for 
some other reason. ' , ♦ ' 



ai;j ij/lusiration 



While many pf the indirect measures are 
^ighir^ ^ngenieus, 'an investigator must consider 
their disadvar^ges>^efore deciding to use^ one of 
them. The npRjyii^vantages are: (I) they usually 
•in\f)l\e,ftt I ea^BBP^ degree of deception alhd oc- 

- casion^ltv sqm^ ifivasion of property, since indlW / 
1^ vfdual^ ^re inducea to respond under, some pre- ' 

text other' than the investigator's tru^ interest and * 
since they are^ enc()|iraged to reveal matters th^^ 
they riii^l)t perhaps wish to conceal; and (2) very 
fe*\, if any, of thej^ measure ^h ave beeh subjected 
to an\ extensive evaluation oiljfcher their reliabil- 
ity or valicUtv. Tf^t is, investigators employing the 

^ same indirect measThie^ften get conflicting results, * 
and indirect fneasures do not correlate very' 
highly, if at all, wifh mher types of. rcfeasures 
. signed to Uip the same attitude. Perha^ because of 
reliabiUtv and v^hdity problems, indirect ap- 
proaches to studying attitudes aj*e not used very 

• frequentlv in social -psychology.** 

* 

PhysiohgicaL measures of attitudes. At the oppo- 
^iie end -of- the coiitirfuum from* measures I'elyjng 
on an individuars self reports are those measui^ 
reiving' on" physiological responses not subject tot 
' *cpnscious Control/ While the study m such meas- 
"ui^ depends, qf course^ on the subject's willfng- 
ftess to tftoperat^, tl\e results are usually inde- 
pendent Neither self knowledge or willingness to 
report, ijphe usual procedure is as^ follows: mdi- 
' viduals ate exposed to th^ presence of a member, 
of an object gr oup ^ piA o pictoriaH^epresentations 
in situ'ktions /nvolvirtg members of the object 
group, and involuntary physiological reactiofts are 
recorded ^ngultaaeouslyll^l hese measures often 
involve the galvanic skin response,* blood pressure, 
h^art rate, and dilation or constriction of the pupil 
of the eye. These measures are based on the fact 
that ftiVsiological changes accompany the experi- 

- ep^e ofllnotio(i» ^nd the underlying assumption is 
. I that the. physiological measures of these changes, 

are Indicative of auitudes. * ^ - ^ 



As ai;j iVlustration,_^ankin and Campbell 
(1959^ employed two exp?rinn/enters/ one white 
^an*d one black „to^ attach and adjust the electrodes 
rteces^ary for m^asuremem of the galvanic skin re- 
sponse. Results indicated significaiJtty larger gal- 
vanic skin responses when the bl^ck experimenter 
adjus'ted the electrodes tfian when the white ex^er- 
. imenttr did. Sin#|ferly, Cooper and his associates 
(Cooper & Siegal, l£f56) found greater galvanic 
skin response^ to the names of negatively valued 
groif^s than , to those ejf neutrally valuer! groups. 
In addition* they fouridithat galvanic skin re- 
sponses increas^Tto bom complimentary state- 
ments about disliked gr^ps and derogatory 
statements oL valued groups. In each case, the un- 
derlying assumption was that tjie changes in 
physiological arousal wafs a result of the individu- 
afs attitudes. 



^ ^ U should bc.4)ointcd out that tKcs€ thrasurr^ ar^popfj^in- the 
n9,^fj£^^pf«£l>nicaf. psychology fact, mtny pf thr mraiufc* thaFVc havr 
^ <f(ltrunrd Jiavc brrrt adapted frony^sts dctignrd for Anical popula- 
TtiolU However, ^vcn in clinical psychology thr i^idrncc for cither their » 



Shty or vali4ljity is m question 



ERIC 



^ More recently, there has been mounted an 
impressive sgries of studies which indicate that. the 
dilation and constriction pf th^ ftMpil df the eye \% 
related to an individual's attitudes. Specifically, 
^ Hess's evid^nce'*^£jidicates*that. an itidividual's 
pupils dilate in response to pleasurable stimuli and 
cpnstrict in response to unpleasant stimuli. These 
promising findings, ^long with the great potential 
that social scientjst^fter> see in physJ^BIpeasures, 
made this techniqur qtiite interesting^d even ex- 
citing. However, recent systeiTLatic research by 
Woodmaniiee (1970) has not only faijgd to repli- 
cate Hess's results but has furfter shown that thie 
pupil of an individual's e.ye not onFy dilates lo 
pleasant stimuli but to extremely unpleasant 
stimuli, e.g., a picture of a filthy toilet* in a 
brokers-down bathroom or a picture from a grue- 
some murder case involving a jocal coed. Thus, 54t 
present the dila^on o£^the pupil of the eyij^maynot 
be as promising, a technique as w^t%--a4^anially 
thought, although it may very well/at leaJPsJndex 
interest. and attention. . . 

^ While physiological measures have the aavan 
ta^e over direct measurej that it is more difficult , 
for the subject to take or gjve false answers ancHh^ 
apparent ayJvantage over indirect measur^^jpof*- 
. being more precise and obj^e€live, tti% disadva^?- 
tages are ^Iso very apparent.' First, the obtaining of 
attitudinal Pleasures is usually restricted to a de- 
fined physical setling where the available resources 
(>ermit proper recording. Second, irrcreases and 
decreases ia physitological arousal cannot be in 
preted withq^jt knowing what the ^nvironme 
V stimuli are t^fc^hich'the subjects are resppndi 
Thirdf at^^MpWwith^he physiological rti^a.sUre of 
dilation andl constriction^ of the- pupil of the eye^^^ 
there is serroUs concern with respect to interpfeta- 
.bikt)?. Fourth, studies employing more than one 
physiologicaF measure to tap the same attitude 
o^ken result in on^^M^hxj^^iitfiicaiiQg a finding 
' t^at the' others do not;^h^n tlii^ occurs it raises 



questions of exactly what the various (jhysiological 
indices are measuring , . 

Notwithstanding these triticia|Is, phj^siological 
measures of attitudes may very welP prove to be^ 
more reliable ^nd valid in the f^tufe. WorK by 



trfirteen*companies* practiced some fgjcm of dis- 
crimination. ' ' 

^ Kith the other >vky^ of measuring attitudes, 

observational methods have their advantages and 
disadvantages. The advantages of observational 



Cook (1968)i indicates that subjectg^,who wfcre con- ^ q/flLhods are: (1) they can tell us a great deal about 



50 



ditioned to respond favorably t<?statemefits 
cerning the attitudinal .object r.esponded favorably 
in terms'of physiological measures to other posi- 
tive»statements, and subjects who were conditioned 
to respond negatively to the attiti^dinal object re- 
sponded negatively to other negative statements. 
Results of Co^'s work^ are ^ promising but this 
technique is not far^ough along to warrant any 
conclusions about its usefulness as an attitude 



measure. ♦ 

^.^^servational methods. Another approach to 
me^ifrjng attitudes is to observe an individuaf in- 
teracting with some sotial object. For example, 
Mehrabian (1969) Ras mounted a program of re- 
search which fndicates that nonverbal behaVior is 
clearlv related to attitudes towai:'d another person. 
In particular, Mehrabian finds thit positive at- 
} tutidev are related to assuming closer interper- , 

«*nal distances, more eye contact, pore direct 
oulder orientati(5n, and more forward-lean than 
are negative attixudes. In otheatAvords, our non- 
verbal behaviors are more intimate with ^Ihose 
whom we like than with those we do nrui like. 

Another area of research that is making use of 
.observational methods is the field of progniH|fi- 
evaluation,^ concerned with. Pleasuring the effec- 
tive'ness of social programs. Public institutions 
concerne^J with such topics as health, crime, -and ^ 
education are increasingly being called upon to ' 
demonstrate* tfie effectiveness of program* which 
taxpayers are supporting.^J^or exan>pl§, Bickman . 
(in press) in evaluating, the . effectivefiess of a mass • 
media campaign designed to encoura§£ the report- 
ing of shoplifters,^fo«nd that the-campaign was-ef- 
fec^ive^in communicating and altering an incTi- 
viduaf's intentions but not in increasing the* 
number of cases that werg refjorted. In other 
studies apptaising the effectiveness of the masi3 
media it has-been found that there was little effect 
on such beha^viors as aggression (Fcshback & 
Sitjjer, 1^70) or automobile seatbelt use (kobin- 

al., 1973). '^mrn^ 

Individuals who ar'e^ concerned with social ac- 
. tion research gften have employed observational 
methods. A good example is provided by Saltman* 
(1975). Concern over the hnplementation of anti- ' 
discrimination housing l^ws led Saltma^ to audit a' 
number of real estate companies in the Akron 
axea. Saltman sent black and white volunteers to 
\each real estate compariy^ The volunteers kept 
wfitten accounts of their observations which were 
then coded to fndicate possible forms of discrimi-* • 
nation.. The results vindicated that tvjelve out .of 



ivior pattei'ns; (2)^ they can aid us in the selec- 
tion of problems and hypo^j^es; (3) observation 
may be the only*feasi)ble rn^Hod by which to 
* gather data, e.g., restych with children or 
schizophrenic persons or research -concerning how 
♦ pepple'react to patural disaslers;,(4) they allow an 
investigator to record an 'individual's ongohng be- * 
havior as ft occurs; thus scientist^ cpncerned^with 
fiow people interact und'er certain circums^in^es 
can observe their behavior under those* circuin- 
t stances. ^ . * 

The disadvantage^ of observational methods ^ 
are nume^us. First, ethicaiproblems (invasion of 
privacy) do arise, particularly wl^eri'indiviclwlls afe 
unaware that they are. being observed. Second, • 
when pieople know that their^ehavior is being ob- 
served, the investigator frequeiitly encounters tlfe 
same probl^rp as with self reports, e.g., subjects 
alter their behaviors to make themselves look 
gaod. Third, it is not always clear whether the ob- # 
Served behavior reflects ^n underlying disposition 
(altitude) or whether th* behavior is appearing for 
some other reason, e.g.^ a behavior may be nearly 
independent of external patterns of stimulation. 
Fourth, without the^maniputation of variable^ it is 
difficult to clearly establish cause* and effect rela- ' 
tionships. , « ' ' 

Despite these -advantages observational 
ods have become increasingly popular dver 
past few years. As social psychologists have be- 
come increasingly interested' in ecological psychol- 
ogy, environmental psychology, social action, and 
program evaluation, observational methods have 
acquired more respectability then fhay had in the 
past. • , • * . 



P 



Attitddes and the predictions of befbVior 

Recall that the underl^i(B^ rationale for study- 
ing attitudes is that what people say is a good pre-, 
dictor of what they will do. Belowc are a series of 
summary statements made by authorities'who haye 
"Analyzed and evaluated the nun^erous studi^^on 
the relationship bet\j;een attitudes and Overt 'be- 
havior. 

Studies on the relations of attitudes and be- 
havior have almost consistently resulted in the 
. , conclusion that attitudes are a poor predictor 
of behavior (Ehrlich, 1969). 

^ Attitude Fesearch; has long indicated that the 
person *|L v#i^bal report of his attitude has a 
r^|her low correlation. with his actual behavior 
toward the object of the attitude (McGuire, 



56 



Most researchers have had little success in 
predicting behaviot from attitudes toward 
• ethnic groups (Rrigham, 1971). 

There is a growing awareness among inves- 
» tigat()rs that attitudes tend to be unrelated to 
overt behaviors* (Fishbein & Ajzen, 1972). 

The best known example of the discrepancy 
between attitudes , and behavior came as early as ^ 
1934. A social psychologist, LaPiere traveled from ♦ 
coast to coast with a young foreign Chinese couple, 
stopping at over 250 hotels, autocamps, cafies, and 
restaurants and receiving normal service in all but 
on^. ^ix months after the trip, LaPiere mailed to 
each of these establishments a simple question- 
naire which included the question "Will you accept 
members of the Chinese race in your establish- , 
ment?'* The answers he received* were 92Vc "No," 
despite the fact that all of these pla^ had, in fact, 
served his Chinese friends not l^BL hrfore. In 
other y^l^s, the verbal respons^^K>re just the . 
exact' opposite of the behavioral^Bp'onses. This 
state, of affairs not only defies intuition and com- 
fVion sense, but it has frustrated and annoyed so- 
cial psychologists for years. 

In attempting to account for the failure of at- 
.tULhies to predict behavior, social psychologists 
have identified three factors in addition to an jn-*. 
dividual not knowing what his, attitude is or to lie- 
ing. .These factors are measurement problems, 
conflict among attitudes, and situations. Let us 
briefly discuss each in turn. 

Measurement, The typical procedure has been ^o 
determine a person's feelings towali a general 
class of objects (members o%he Chinese race) and 
us? this information to predict that person's be- 
havior toward a particular member of the class (a 
Chinese coUple). The more the particular member 
^ the class deviated *o.r differs*from the general 
aass the mgre difficult it becomes to make accu- 
rate prediction*. In LaPiere's study the Chinese 
couples that were admitted to the various estab- 
lishments may have possessed very few, if any, of 
•the characteristics or stereotypes held by the sub- 
jects. In"facl, by being vjell-dressed and in the 
company of an occidental professor, they wer^ al- 
most certainly not much lik^ the imag^ of "a 
Chinaman** that proprietors intended not to serve. 

Conflict among attitudes. People often have 
more than one attitude toward afly object, and the 
discrepancy between attitudes and beh,avior Qftetf 
occur because other m6re dominant attitudes are 
operating^ a particular situation. For example, a , 
physician y^o is a strong profK)nent of HMOs'may 
not be willing to speak publicly in fator of them 
bccauS^e of an even stronger feeling that physicians 
should not actively lobby for their qwn medical 
interests. The intensity and exttemit^ of attitudes 
both proba^)ly Vary somewhat from time to time as 
^ -^sult of rc;cent experi^ce; and attitiicJe may 



X 



be strong ei|ough to dominant at onetime but 
perhaps notion all ocxasions. * 

Situation^. Perhaps the most important factor 
accouruitig for the discrepancy between attitudes 
and behaviors is the constraints or behavior that 
exist in any situation. Situational factoi8(i are very 
\ powerful determinants of behavior. We are not 
eally "free*' to behave in any^A^ay .we might like in 
ust any circumstances. Some of the constraints 
represent incapabilities* of responding in certain 
, w^ys* in .certain situations; others represent con- 
straints derivecj from social expectations and rules. 
As an example of the first kiqd of constraint, it has 
beefi noted that policemen do not seem to change 
theit behavior very much, even when they know 
they^are being observed, and they often engage in 
rather undpiirable or unprofessional behavior 
witKoB^rvers present. One possible explanation 
that ha^been posed for such behavior is that the 
behavioral^ repertoire of many policemen is quite 
limited, and *ihey literally canjiot behave differ- 
ently than they do in some situations. Another 
constraint by inability to respond would be failure 
to donate to a mghly favqred charity because of 
lack of money at the time (if solicitation. The kinds 
of constraints stemming from social conventions 
are illustrated by the substantial ui^iformity of be- 
havior in church, the fact that military enlisted 
men will usually say "Sir" even to officers for 
whom they have no respect, etc The difficulties 
. that have been met in identifying consistencies in 
behavior, accompanied recognition of the very 
obvious and substantial importance of situational 
factors, has led more and more social psychologists 
to Ignore differences between persons and concen- 
trate on situational factors in determining be- 
havior. Whereas 30 years ago the social 
psychologist's bias was toward individual disposi- 
tions, today the bias is toward situational factors. 



elusion 

From what has been presented it is easy artd 
perhaps even logical to conclude that the stu^y of 
attitudes Is a waste of dime., Many scJcial 
psychologists have accepted such a conclusion. 
\VT)ile sucl^a conclusion can ^ partially supported 
by the empirical, data, there is m my estimation still 
room left for the sludyof differences between per- 
son in attitudes and related behaviors. 

. Recently two social psychologjsts, Bem and 
Allen (1974) have suggested that part of the pr(JB- 
lem of identifying consistency in behavior has been 
to identify the set oi be havic^rs* across which consis- 
tency p to be expected. For example, if a soldier is 
asked" whether he likes vegetables ^mI then it is 
discovered that he Will not ^at rfl^aga,' kale, 
acorn squash, or okra, iWmight not ne reasonable 
to conclude that the soldier does- nor like vege- 
tables atierall. A better procedure mijht be to ask 
first what the soldier considers to be edible vege- 



cables and then determine Khether he likes thdse 
oji his list. Shnilarly, if one wishes to predict • 
whether a |>ersmk4Wll ''cheat'' based on'self-repocr 
of disposition to che^, it would be a good idea to 
fthd^out from the subVtt ji|5t what he or she-con- , 
*sklers to be cheating b^havror. Moreover, one can 
'find out directly from/a* person about behavioral 

^'•'consistency. A student might say 'i always keep my 
room neat and tidy, but my car is always* a m^ss." 

*'Both those stat^emehtsl might be found- to be Jtrue, 
in. which case the'^dent could' be'considered 
quite consisterrS-rnbehavior. but not ^nece^sa'rily^ 
•wjthijri fairly arbitrarily defined categories. \ * 

fT one war\ted to follow such an approa<^i in 
studying the attitudes of the public toward a res- 
cue service, one would want first to find out what 
services the respondent believed ^ere provided 
and what the important factors were in the pr6vi- 
sion of such services, h miaht then be determined 
that the person was consistently pleased with re- 
sponse times and with the technical quality of the 
sehices but dissatisfied with iht ^eftieanor of am- 
buLance attendants while handling lov%er class ^nd 
iHctigent Victims. - • * 

This approach is promising, but it is t(X5 early 
to m*ake a definitive judgrrfent on its valQe. How- 
ever, u should be clear that the assumption of cop- 
sistenc\ gf responses toward social objects has been 
given up, ii^'pd "social psychologists are 'now lo<)king 
dt a person's feeling toward a specific object in a 
specific situation, and then observing for the cor- 
responding behaviors 

In summar\.*the social psychologist's assump- 
tion that atntudes lead to ^ pattern of consistent 
responses, (particularly con*iSt»nt overt behaviors) 
toward a sociaPobjcct cannot be supported b\ the 
e.xisung empirical data. J^ather, an individual's be- 

' havior seems to, be affected hy conRieting attitudes 
as well as situational factory The most promising 
approach appears to be more' specificitv ih the 
questions that are asked so as in be'abl^ to predict 
when an individual's' dispositions wiii lead to con- 
sistl^it or inconsistent behaviors. 



fleferences 

Allport, G VV. Atntudes. Jjri C. Murchison (Ed ), A' 
handbook of social psychology. Clark Univ. Press, 
1935. 

Bem, D.J. 8c Allen, ^On predicting some'of the 
people some of thp time. The search for cross- 
situational consistencies in behavior. Psychological 
Review, 1974, 81, 506-520. 

Berkowitz, L. Sofia/ psych^fgy. Glenview, 111.: 
Scott, Foresman, 197i. 

Bickman, L. Bystander intervention in a crime: 
The effect of a mass media campaigh. Journal of 
Applied Social Psychology, in press. 



Brighim, J.C. Ethnic stereotypes. Psychological Bul- 
letin, /d71,A^15-38. 

Burwen: L!^^ampbell,'D.T., 8c Kidd, J. The use 
of a sentence completion test in measuring at- 
titudes toward superiors and subordirlates.yourna/ 
of Applied Psychology, 1956. '^0, 248-250. 

Cook, S.W. Studies of attitudes and attitude meas- 
urement (Mimeograph)j|AFOSR Technical Re- 
port1^|l968, .Bpiilder: Institute of Behavioral Sci- 
ence, Oniv. of Colorado.* 

Cooper, J.B. 8c Siegel, H.E. The galvanic skin re- 
sponse as a measure of emotion in prejudice, ytmr- 
nat of Psychology, 1956, ,^2^ 149-155. 

jfhrlich, H.j. Attitude?, behavior, and the interven- 
ing.variables: /fmrnran 5of«o/o^/, 1969,^, 29-34. 

Feshback, S. 8c Singer, R.D. Television and aggres- 
sion. San'Francisco: Jossey-Bass, 1970, 

Fishbeiii, M. 8c Ajzen, I. Attitudes and opinions. 
Annual Review of Psychology, 1972, 23, 487-544.^ 

Haire, M. Projective techniques in marketing re- 
.se^arch. J oUmhl of MdrketiT^, 1950, 14 , 649-656. * 

Kerr, M. An experimental ir|^stigation of national 
stereotvpes Sociological Review, 1943, 35, 37-43. 

LaPiere, R.T. Attitudes vs. actions^* Social Forces, 
1934. 14, 230-257. : 

McGuire, W.J. The nature 9f attitudes and at- 
titude change. In G. Lindzey 8c EJI\ronson (V75j. 3, 
2nd Ed.;. Reading, Mass.: Addisorf- Wesley, 1969. 

Mehrabian,* A. Some refe?^rits and measures of 
non-verbal behavior. Behavior Research Methods and 
Instrument^ion, 1969, 7* 2b3-207. 

Rankin, R.E. Campbell, D.T. Galvanic skin re- 
sponse to f^egro ^nd White^xperimenters. yowma/ 
of Abnormal and/iociul Psychology, 1959, 31, 30-33. 

Robinson, J, P. 8c ShavjeT> P.R. Measures of social 
psychological attituies A mA Arbor, Mich.* Institute 
ferS(Kial Research, 1969.^ 

Saltman, J. Implementing open housing laws 
through soci^ action, journal of Applied Behavioral 
Research, r975, 11, 39^1. 

Woodmansee, J.J. The pUplf response* as a meas- 
ure of ^social attitudes. In G. Summers (Ed.), ^t- 
Mtudes measurement. Chicago: Rand IVJ^a^Jally, 1970. 



58 



RecruKmeht, Seliclion, Ti^irting ar^ Supervision 
of Civilian Observers to Worlcjn Police 
Patrol Operations Research 



William Bieck ^ 
Kansas Cux_\fissouri Police Department 
Operations Resources Unit . 
Kansas City, Missouri 




// seems inevitable that if the quaiUy of performance of emergency medual personnel's to be evaluated xn an a4eqUaU way, observers 
are going to have to be deployed at performance sites, whether m vehules or in ERs, The development arui monm^ng of a .good 
observer t>am is no small feat This paper detaits the procedures followed by ]Villiafftiifck, who has had' unusual success with an 
observer study tn the police field. His paper also conveys a good bit about the proudures whuh are necessary m order ^ achieve a 
high level of datct quality control ^. 



Befiore' proceeding with the topic to be dis- 
cu^lllje^regarding the recruitment, selection, train- * 
ing, ancT supervision of civilian observers who 
-worked (fh the Response Time Analysis Studv, 
mention should be made of the study itself in 
order to provide the listener with sufficieni back- 
ground information to assess ijje context in which 
the observers functioned. 

The ^espon^e Time Anajyji^s^Studv. a fiv^- 
\ear project funded through the Kafteaal Institute " 
of Law Enforcement *and Crimifial Justice, the re- 
searc+i arm*' of the Law Enforcenjentv^Assistance 
Administration, is currently being conducted b\ 
the Kansas City, Missouri, Poike Department, the 
agency which was I'he recipient of the grant. The 
major objective of ^e study was' to analyze rela- 
tionships between time taken to report crime or 
request police service, process and dfspatch citizen 
requests, respond, to locations from whicfh assist- 
ance has been required, and measure probabilities 
associated withipn-scene criminal apprehension, 
witness availability, victim injury, anid citizen satis- 
faction with police response time. The second ob-. 
jective sought to analyze problems and patterns m 
crime reporting or requests, by citizens foV police 
assistance. 

A total offsix data collection com^nents we^e^ 
established in order #o obtain information neces- 
sary to addr^s qmitions generated by these objec- 



tives: 



1) Observer Comfi^^ Th^ Observer Compo- 
nent, the focus o^Wis presentation, consisted, 
of nine civilian observers, Iwo females, and 
seven males, who accompanied police'offiders, 
involuntarily, for a period often months. The 
observers rode four eight-hour t9urs per weA 
with police officers assigned to police the city's 
most active robbery and assault beat-watches. 
The primary responsibility of each observer 
was to record times documenting officer dis- 



ERIC 



patclf, response and arrival to citizen contact 
and the location to which the officer had been 
sent. Additional information con^trning loca- 
tions from which and to which officers had 
^ been dis^Jatched and a description of on-scene 
activities* e.g., completion of an offense re- 
port, criminal apprehension, administration of 
first aid*.or requ^est for an ambulance or other 
police specialists, etc., was also obtained. 

2) Tape Content Analysis Component All calls 
coming into the l^ansas City, Missouri, Police 
.Departm.ent that are nrocessed througlj the . 
communications-dispatcn center are recorded 
on tape. The nfain purpose of this segment of 
the study was to record times pertaining to the 
initial connection between citi^?ns and police 
dispatchers,' crime reporting or service re- 
quests by citizens and broadcast ^nd dispatch 
messages to field officers. Additional informa- 

*tion ^Iso collected included an analyses of the 
lapec^conversations between citizens and dis- 
patchers to identify problems in citizen in- 
teractions with dispatchers and dispatcher 
^ communication* in transmitting assignments 
to field officers. 

3) Citizen Follow-up Intennew Component In- 
dividuals who teported crimes, requested 
police assistance or wer^ nctim* of criminal 
offenses were identifi^ and interviewed in* 
or<jer to obtain information regarding the 
\\me at which the crime occurred or >yas <iis- 
covered, ^he lej:^h of criminal visibility if a 
suspect was seen, the location at which {he \ 
crime occurred, the citizen's activities before j 
the commission of the incident took plac#, the 
time taken artd problems enc<uintered in re^ 

^ porting the incident to the police ^nd th^iti- 
zen*s satisfaction with police respo^rwe time.' 
and the officer^s on-scene activities. Additional 
data collected included the vjttim's knowN 

59 . ^ 



53 



edge, if any, of the suspect involved in the in- 
cident together with demographic chara<*feris- * 
, ti^s of victims and witnesses. ' 

As can be seen from a^eview of these cDHec- 
' tion components, information is available^to con- 
struct a time continuum consisting of intervals 
*which, for examplf^, accijJUit for the tirpe taken for 
a crimmal offjense to occur, the time tgjften iq re- 
porting the incident to the police, the time taken 
to proctss the call through the communications- 
dispatch center, and the titne taken by an ofQcer to 
respond to and jcontact the citizen who initi|Mted 
the mobilization. ' ^ ^ 

The three remaining collection components 
• consisted of a 'Test Call*' experim||U--to measure 
the amount of time required to reach a police db-" 
patrher ttiroUgh the polie department's "Crime 
Alert" telephone numbc^r (lemergencv or police 
assistance), the department's administrative tele- 
phone number, anc^ the Southwestern Bell tele- 
phone op>erator. Th-is information, which was col- 
lected bet\>een the times of seven and offt^.m. 
se\en d^\s a week, was necessary to ev^H|e the 
subjecti\e respK)nses given'.by citizens in ff^rjing 
crimes or requesting police service. 

A *^V'i<^im Injury Follow-Up' surve> was con- 
ducted to determine the degree or extent of seri- ' 
ousness associated wiTth vjciim ifyu^Jj^plting 
from crime or other emerg^j^cV'lTTedican^P^ 

Finally, an "On-Scene •A'Vrest and Cbnviction ,1 
Follow-Up" component was initiated to'assess 
probabikties associated with criminal justice dispo- 
sitions. Tracking Part I felons through the**crimi-v 
nal justice svstem was considered necessary in 
order to e\aluate the ultihiate importance of on- ; 
scene arrests as a product of rapid police resjxjnse 
given the^l^spicion that convictions for the same 
grounds as arrest would be fen. Reasons for jiuli- 7 
cial fallouts are'also being obtained 

-Having con sidered 'the rr\^thod*ological 
framework in which data vgere collec|ed, specific ^ 
attention will now be focused upon theiPbserver 
component. The decision to utilize, dvilian obser- 
vers- on the , Response ^imc Analyfk Study was • 
made witn 5isciplined reluctance. Although neces- 
sitai.ed by the need to obtain information unavail- 
able through more 'conventional means, the 
employment of civilians to accompany police offi- 
cers during ro^Mne tours of patrol presentra mul-- 
tiplicitV of chanfhges even for the most astute ad- 
mirflstrator with a flair toward research. Problems 
encountered given the decision fo employ civilian, 
observers can be coucheifuncher three headings: 

\\ Administrative OnCe "statistical calculations.* 
\^ had. been computed to determine the number 
of incidents needed for adequate ahd repre- 
sentative analysis, an exercise that also'pre- 
dieted the number of ot)servers to be hired, 
* the salient and immediate concern posed ' ^ 



60 



by the establishment of an observer pi^ogram 
was cost. Suffice it to say that from the proj- 
, ect*s indeption np provision was provided in 
, the original proposal for an observer 
component.^ * • 

Unanticipated c1>st-of-Jiving salary increases 
which were triggered by unpfecedented in- 
iP^/tlationary rates served to compound concern 
for •-Budgetary strain during the fledgling 
stages of the study., Of tantamount impor- 
tance,' Isalary increases also escalated fringe 
benefit 'payments Which are computed at fif- 
teenipercent of gross earnings. 

As a resUlt of the observer program, addi- 
tional supervisory, liaison, quality control and 
clerical -staff were also ^nl^eded to coordinate 
and dissenriinate information, maintain service 
records, follow through on chest x-rays and 
fiu innocul^tions, issue and complete' travel 
. vouchers, insurance forms and time records, 
Sfcure office space/ prepare supply and 
equipment requisitions, and manage a part- 
time, non-profit placement service for those 
confronted with bleak prospects of future 
employment oppbriuttities ofi'ce field data co^ 
lection had been completed. . 

2) Managerial With nine full time civilian ob- 
servers, one Overall collection* supervisor^ne 
observer supervisor, one fiaison officer and 
^ne quality control .clerk, considerable effort 
was directed toward establishing lines of 
commnrtication and delineating areas of 
^ i^esponsibility. 

The observers were given their own field 
quiirters whith contained an offite for their 
supervisor, a conference rooqi and ^ small but 
functional message center Althoiigh distance 
se created ripples of alienation^among ob- 

\ servers toward the administrative and analysis 
^ suff, who were located in the central business 
district adjacent tO p)olice headquarters, sepa- 
rate office ficilities were more convenient, 
being strategically located between division 
stations where field tours commenced,- and 
prbvided the observer supervisor wrtfi suffi- 
, cient latkude to accjuire a working knowledge 
of each observer's values, expectations, aspira- 
tions, and idiosyncrasies. Concessions were 
nml^e by the observers supervisor regarding* 

, alterations in scheduling so that, exceptions^ 
could be made to accommodate those wishing^ 

* to pufsue course^work at local flilleges and ' 
uhiversitics. Coordination of training sessions, 
where observers were require^ to provide as- 

. sistance in instrumentation construction and 

• modification, \leployment scheduling and con- 
trol of rumqr and innuendo, which surfaces as 
an incessant problem whenever civilians and 
Jworn personnel are forced to work together. 



aiientioh iit addressing 



consumed majo 
managerial issues. 

yirfStethodohgtcal In general utilization of 
(rained obser>'*rs is indicative of the state of 
the art iti whiJh researcf^is being cbtiductepl. 
Ob5e7vVi;c6nT|ponenis exemptify 4ldmhsifTn 
'that liiil^HMmown about the subject to be re- 
iearched. It also suggests that the nature of 
the investigation is explorafory and dewrnptive 
tather than Experimental; the latter of which 
can usualU anticipate extraneous variation 
and hence controL a corxccpl t;entral to scien- - 
tific inquiry, hold constant or account for m- 
Ouences which, might ^ffeA the relationships 
beinetested. ^ ^ 

% \ 

Although the utilization of trained observers 
to^ollect data on research projects is elem^tn- ' 
iar\ 4^iv^n its methodological niche \is-a-vis 
more sophisticated techniqu«|fcised in elabo- 
rate research designs, proJiJ^s associated 
wrth the administration and management of 
such endeavors are extremeU complex. W'rthj 
out pursuing an epi^temologickl tangent re- 
gardirlg the hisionciiv of science, what science 
\^ arid is not. suffice n to sa\ "ihat^two 
methodological limitations are inherent in ob- 
server dajta collecpon procedures 1) Control ^ 
* Fjfnt Comrxli effect refers to the* change or 
influence the observer creates b\ his own 
presence In* the situation- he is st]Jid\ing In' 
mt)re concrete terms, observers riding with 
|X)lice officers who are awi^re of the observers 
responsibilitv to obtain informatiorj^ertaming 
to response tim^s might bt inclined to drive 
faster (or slower) in order to impress a nov.ice 
^ civilian Furthermore, officers might, feel 
compelled, knowing that thev are being ob- 
served, to be more lliorough in conduction of 
on-sceoe activities, e^g,. report taking, process- . 
ing evidence, etc. and 2) Btasfd-vteufotnt Ef- 
fect This concept descYibes the potential for an 
observer to become emotiphallv c-onsumed 
into the. situation under investigation thereby 
militating against his objetiivitv An observer 
might be pf)sitiveU c(X)pted b\ a patrolman in 
terms of fabricating data that would place the 
officer in an unfavorable light or become 
negative toward policemen and the manner iiw 

which calls are handled. " ^ 
* 

As can be adduced ftom this discussion, either 
limitation, unless checked, will lead to serious 
distortion in data collection and analysis. 
Having Reviewed the setting m which the ob- 
server component functioned and problematic 
considerations generated by the decision to estab- - 
lish an observer progranA, it is (ime to^rcKeed to 
the business of recruitment, selection, training and 
supervision of observers. 



ERIC 



The qualities neC:^ssary for a good observer 



were not easily defined. The role demandecTa per-, 
son with a Comfilex and sometimes inconsistent set 
of attributes. A good observer-would have to face* 
and handle many ambiguities inherent m police- 
citizen encounters, requirine^him to have consid- 
erable adaptability to a broad range of situations. 
Jhose situations would vacillate between extreme 
boredom and intense stress. In addition the role 
would r^uire an unobtrusive individual who 

.could passively blend into any setting, yet attively 
collect pertinent and accurate project data. Other 
characteristics such as good judgment, dependabil- 
it>\and honesty would ^Iso be necessary to insure 
systematic observations and qualitauve data. Since 
all observers would be contract employees of the 
Kansas City*, Missouri. Police Department, they 
would have to p^ss a'thorough "backj^round 
investigation. • - ^ 

Inkiallv. it wafc decided that only rn^e candi- 
dates wouldsbe recruited as observers, the 
rationale being that a female observer ^nttioning 
in a predominatelv male line of work would intro- 
duce an^lenj<nt of bias to bnt^ police officers and 
citizens bv .producing expectations for which it 
would be difficult to control. The easiest role for a 
civilian observer to acclimate in the police-ciii/en.^^^ 
milieu \<^as either' tbat of a plain-clothes detectiv^ 
or a poJice Fecruit. it was considered probrematlt^ ^ 
to cast ^ female in either role. Since no empiricallv 

^tested data were availafefe'to support ^uch a posi-" 
tion. the legal obligatiotj of the police department 
to be non-discriminaiorv in as hirinj^practices (the 
studv was^also fedcrallv jiinHed) resulted in the 

, ||<)Sition being o|>ened to^oH^sexes. » 

Initial concern about accepftnj^ o^^female ob- 
servers *w as borne out somewhat dtlmig the first 
weeks of field observation One incideKt involved a 
woman who h^ad called the police regarding a de- y 
str^ciion of propertV complaint. When con tacteclgj^ 
later bv a telephone interviewer she said the offi- 
oer had arrived late on the scene (he was accom- 
panied by a female observer) and she had assumed 
he had picked up his girlfriend jirior to respond- 
ing to the call. In another polic-e oriented study in- 
volving observers in Rochester, New York, citizens* 
complaints were so frequent that specially de- , 
signed bla/ers had to be worn by all females while • 
conducting their field work. 

To mitigate against role conflict between offi- 
cers and civilians on thej^TAS, all observers were 
requ-ired -to display department identification _ 
which consists of a personal photograph captioned ^ 
•POLICE— CIVILIAN EMPLOYEE;- After in- 
stituting that procedure tritizen complaints abated. 

The only specific criteria first required for 
application for an observer positioo was a 
mimmqm age limit of 21 and completion of high 
school degree requirements. It was^ later learned 
"^that fhe minimum age stipulation was not a tie-, 
partment requirement; the' lower agf limit being 



• 1 

-'ss 



61 



17 with parental concent to work, Minimal applica- 
tion criteria were established because of the lack of 
evidence regaVding the most suitable background 
for observer candidates. In ^ct, tmploym'eot 
- criteria were so general that dM\osi anyone might 
qilalify for th^position. 

The mo^kknme^iate market for »^ualified 
candidates at first appeared to be local Associate 
' aRl* Baccalaureate Degree programs. As a* result 
^11 colleges and universities wj|Jiin^ <ixty mile 
radius of Kansas Citv having a liberal arts or crim- 
ijial jOstice major were contacted, ^rthe institution 
had a placement service, it too was contacted. In- 
dividuals involved in hiring civilians for the polke 
department were also advifsed of the observer 
openings. The initial requests for applicants re- 
sulted in onlv fifteen persons applying for the nine 
position openings. 

After initiallv receiving a p>oor response. , re- 
cruitment efforts were accelerated and expanded 
U) include out-of-state" institutions. The Job In- 
ftfc-mation Center at Sam Houston State University 
ir> Huntsville. Texas, was contacted. Thij^ school 
maintained several hundred resumes oif^eligible 
candidates in the criminal justice field Northeast- 
ern University's job' placement advisors for the * 
College of Criminal Justice m .Massachusetts were 
also notified (^Ttne openmgs. Finally the positions 
were advertised for two consecutive $unda\s in 
The "Kaftsas Cit) Star, the ixietropoiiian area's 
major newspaper 

The second round of inquiries, including the 
newspaper advertisements, brought an improved 
* resporvft. 0\er 200 m'quiries were received, and of 
those, 176 agreed to submit resumes. A total of • 
104 resumes were fins^llv received, sfxtv-nine "per- 
cent from mal^s and thirtv-one percent from 
ff males , , ^ ^ ^ ' ^ 

A revision of the project timetable* to resojve 
research design issues resulted m a li^o-month 
delay of interviews for the observers. During that 
period, sixteen applicants founel" other jobs, four , 
moved away, four changed tKeir mmds, three 
withdrew citing "bad hours" as the cause (normally 
observer shifts ranTrom 4:00 p.m to midnight), 
and three others did not attend their sclieduled in- 
;ervievir5. The remaining dropouts were those re- 
cruited from Sam Houston, and Northeastern 
Universities. 

Originally, two members of the project staff 
had planned to travel to Texas and Massachusetts 
to screen prospective candidates. However, the 
two-month d<lay resulted in a diminished list of 
out-of-state applicants. Travel costs couid -no 
longer be justified given the number of candidates 
remaining^ and after being advised that tK^'y 
would have to travel to Kansas Cfty on their own 
expense (federally funded grants^ prohibit pay- 
ment for relocation to new jobs),. Ihey declined 
"further consideration. • ' ' ' 



62 



A jut al of fifty int e r views were finaHy hrld 
with I^Ky-eigbL* male and Uirelve female candi-» 
dates, selinon process invol^d three basic 

• phases: 1) 'Personal interviews; ^ Field evaluation; 
and 3) A battery of. short test^combined with a 
brief open-ende^ interview. Each^pplicant had to 
successfully complete all three stages in order to be 
eligible fbr final selection. Each phase was de- 
signed to examine particular attributes needed in a i 

.'*good" obsrfver. Characteristks deemed desirable! 
fo^ competent observation \^re evaluated in at 
least One of the three phases, and most were 
evaluated in a second or third phase supplying a 
, cross-reference indication of ability. 

The initial phast involved one police officer 
and one civflian interviewer questioning each cai^ 
didate for approximately an hour. Prior to tff 
interview the applicant was asked to print his 
N^iame^ age, height, weight, and t^phohe number 
on the cover sheet of an interview form. ThivfJro- 
vided the interviewers with aft indication the 
applicajit's ability to pnnl letteib and numbers leg- 
ibly, arj'important'factor in the coding of survey 
data forms, especially in anticipating that raw data 
would be obtained m moving police vehicles. 

The interview began, with a general explana- 
tion of the study and a job description . ^The candi- 
date was then asked a series of ques#ons regarding 
his c^f^er objectives, work experiences, educa- 
tional background, and/ general interest and. ap- 
titiAfle for the observer position. An ambiguous^ 
problem situation was described by the interview- 
ers, an(l th^ applicant was asked to discuss \y Re- 
sponses indicatin^igid or extreme value oBenta- 
tions on behalf *of a ca^nplidate were considered 
undesirable and potentially problematic for the 
- observer role. * 

At the conclusion of the interview, each inter- 
vrewer completed a racing form ranlyng the can-' 
\ didate's listening ^nd communication skills, work 
experiences andAjneral appearance as it applied 
to the role of an oDserver. Preference was givtn to 
applicants with a college degree, experienV in 
applied lejyearch and ^ diemonstrated interes^sin 
police operations. Such qualifications were exv^ 
pecte^ to. facilitate fndividual training for the ob-l 
server role and improve .the likelihood of gualita-^ 
five data being collected. A summary fating on 
scale from one to five was computed for each can- 
didate. After all interviews were completed, the, 
interviewers received the rating instruments to 
reevaluate any prior rejection^ and hiake- final - 
first-phase decisions. A unanimous "rro vote'* was 
required for an applicanii to be rejected. Having 
completed screening diiwig the first ,pha^e. * 
twenij||^c applicants remained eligible for the . 
second plfcase (although twepty-five applicants 
wcre^lesignated for the second phase, three with- 
drew, having accepted other positions, and one 
moved away). 



The interview teams involved in the selection 
process included twp civilian employees of the de- 
payment who had considerable experience as ob- 
servers in police patrol operations and two police 
sergeantj who had an extensive knowledge of field 
operations. One of the seVgeants has been p*re- 
viously sdected to supervise the field observers 
during training, pretesting of field instrumenta- 
tion, and data collection. Several {>olice officers 
. and a department operations analyst conducted in- 
terviev^s when the regul^ interviewers v\ere not 
available. 

It uas intended that at least one civilian and 
one uniformed officer v\ould participate in |he 
interview sessions, however, scheduling conflicts 
resulted in chiVty per^em of the interviev\s-being 
conducted soltlv b\ i4^li«i^ civilia;^ personnel. 
Of the fiftv mtervj/vvs c0mptKed, thirtv-fivc in- 
volved, both civilian and sv\orn iriterviewers, tv\f Ive 
involved stnctiv suorn interviewers and three in- 
cluded onlv civilian interv lev^ers. 

The second stage of the selection process re-* 
quired that observer applicants accompanv police 
officers during routine patrol tours for a 
minimum of sixteen hours (normallv tVetitv-four). 
after which evaluations^ v\^re made *bv a pre- 
( seleried group of police officers Candidates v\ere 
given, minimal instructions on hov\ to behave aitd 
C^re expected to impWVise in some situations 
The evaluating officers v\ere thc^n bv th'e sworn 
members of the irtterviev\ teams to^represent a va- 
rietv of pcrsondIif?¥^^wid methods of emploving 
police procedures in ah attempt to expgse each 
candidate to a varietv of policy stvles, which were 
anticipated to be encountered during the fifteen 
months of, field observations. Officers rated the 
prospective observers on ftie basis of compatibilitv 
(the major consideration), job interest, supervisa- 
bilitv. courage^^^ inconspicuousness on calls .At 
the conclusion oTeacll tour, a polioe sergdant con- 
ferred with the evaluating officer and compiled^ 
ranking of those candidates evaluated bv that of IT 
cer This process allowed an of^ficer to reassess his 
earlier ratings given the broader field f)f reference 
he had developed Only those candidates wno re- 
ceived acceptable ratings from all officers with 
whom they rode were selected for the final phase. 
Of the twenty-one applicants that toOk part in the 
second phase, only twelve' qualified for the' final 
phase 

The final phase of the selection process in- 
cluded a battery of paper and pencil tests and an 
open-ended interview with »the interview team 
The first exergse was '^ picture- rlt^ill test which is 
used by the Regional Center for Criminal Justice 
to determine police officers' ability to observe de- 
tails at a crime scene. The second exercise was a 
symbol driJI testing the candidates' dexterity 
bility to prwH legibly. The final test, one de- 
d by the Shipley Institufe^ provided an indi- 



ERIC 




cant of ih^ applicants' abs<ftict reasoning ability 
• and I.Q. level. ^ 

Once the tests h3d been completed, the ob- 
server candidates were again screened by inter- 
viewers in an o(>en-ended interview. This pi-ovided 
interviewers, who had not previously seen some of 
\he applicants, with a complete review of the final 
candidate field. The applic^tits were then ranked 
according to their scores in the second and third 
phases. These scores were considered with the 
personal evaluations of the interviewers in the 
final phase, and nine candidates were selected. 

The nine individuals chosen constituted a di- 
verse group. The^ oldest member was a thirty-six 
year-old female with a Master's Degree in Public 
Administration, who, incidendy> tesi^ned aft^r the 
first week of training to accept a position as direc- 
tor of a youth service agency. Her replacement \As 
a twenty-eight year-old male who had been desig- 
nated as an alternate from the final field of twelve. 
^The youngest ob^rver was a twenty year-old male 
wnh a high school diploma ' ^ 
^ Of the nine observers selected, seven had Bac- 
calaureate EVegrees of ^Fncl? three had also com- 
pleted Masters Degrees. The averag.e age; of the 
observers was twentv-seven vears Final selection 
. reveled that one-third of those initially selected 
was female. The group represented a variety of 
work experiences which included a correctional 
• officer, a weatfier observer, a personnel techni- 
cian, a psychiatric aide, a clerk tvpist and a re- 
search assistant ^ 
Given the, fact that those qualities, w hich de- 
scribe a "go6d" observer could not be defined at 
the outset/^ meticulous selection process^does not 
guarantee a successful observer prog'ram Once 
' selected observers must be trai#ied and then 
supervised throughout the entirety of data 
collection 

4 

. Observer training on the Response Time 
Analvsis Study sought to achieve two objectives. 
First, It was expected to provide observers with a 
thjj^gh understanding of police operations This 
w^|^()nsidered fiecessarv given the, length of time 
data was projected, to be collected and the realiza- 
tion that civilian observers would be riding with 
police officers in the highest crime areas of the 
city , Secondly , traitnng v^a^ designed to insure that 
observers received a complete orientation regard- 
ing research methodbjogy utilized on the study to- 
gether with instruction concerning all collection 
and quality control components. Thr-ou'gh a com- 
bined review of the occupation to be researched 
and the nature of the researc be undertakeiK 
the" observer gained a mdre^complete understand- 
ing of his work and the responsibilities of sworn 
officers. * ' ^ 

Before addressing s[>ecific aspects' of the train- 
ing format, a briefbut important sidetrack is war- 
ranted, rtaving made decisions to establish an ob- 



63 



58 



server program, the nuinber of observers to be- 
employed and the methods by^hich candidates 
were identified, recruited and selected, the subject 
of the kind otpbservational technique needed pn 
the Response Time Analysis Stucty was discussed. 
Hopefu^y, the folk)wing labels are self explana- 
tory, but there arerot leait four types of observer 

. alternatives: 1) Parpcipant Qbserver; 2) Observer 
Participant; 3) Complete Observer; arid 4) Com? 
plete Participant. Although differences aipong 
these methods vary in degree, they also vary in 
kind. The distinction between a Com'plete Partici- 
pant and a Complete Observer is absolute. Perhgps 
of interest to the lawman is the fact that fh.ese 
methdds are also utilized by individuals o<itSide^he 
research community. For example, an undercover 
narcotics agent niieht wish to infiltrate a druj^raf- 
fic opexation in^rder to , secure evidence. Hi* 

. "cover" or '" front" must appear legitWnate to his 
adversaries before admission and then participa- 
tion in the group is permitted. 

In short, unequivocal guidelines were estab- 
lished at the outset of training to clefine the obser-. 
vers* role as '^mpl^te Observers. " Their mission 
was first and foremost to collect data^ 

Actual tfainmg involved a collaborative Jtfort 
among p>olicemen, ciyilian researchers and f^oject 
consultants Training units on patrol operations* 
^street and field procedures, first aid self-defense 
ancj other aspects of polic-e work were provided by 
the Fiejd Operation^ Supervisor, who was in 
charge of the>«^bserv1ng, t^ith assistance from a re^ 
wred police sergeant, who was the .project's field 
liaison officer. Observers were given a tour of spe- 
cialized units Ki^hin the department, e.g., K-9, 
helicopters, traffic, etc., and received instruction 
on the operations and objectives of those *units 

, from member representative^ A seminar on epis- 
temokojgy, scieace and rese^jch methodology t^as 
conducted by the Principal Analyst, a' former As- 
sistant Professor of Sociology. Sessions 'on field 
data collection techniques and instrumentation de- 
velopment were delivered by an Opcra/ions 
.Analyst, who had conducted field observations for 
over a year while employed on the Preventative 
Patrol Experiment, a study cohducted in Kansas 
City which was funded through <he Police Foundfa- 
tion. A special sessjon dealing with the p>otential of 
'observer co-optation and the concept of "going na- 
tive'* ivas presented by Dr. Albert Reiss, a Profes- 
sor of SocfoFogy from Vale University. Dr. Reiss 
had had considerable experience in directing ob- 
server programs in other f>oHce departments. Fi- 
nally, an orientation ^to the department's overall 
research and programmatic activities was provided 

.by the unit commander of the Operations Re- 
source Unit, an o|>erational planning agency re- 

' sponsible for organizational development and 
applied research efforts within the depa^j^ment. 
Training topics included a project orientaflfcn 



(16 hours), rul^s and regulations (3 hours), a de- 
partment orientation (18 hours), police work (42 
hours), Jfescarch methodology (16 hou|:s), in- 
' strumentaiion development J[76 hours), and field 
work (72 hours). Over sixty' percent of the pro-'^ . 
gram was focused' on instrumentation develop-, 
ment and field work. Fnsum the observer. iralhing 
program conMsted of 243 hours of instruction, 
field tours, seminars, and discussions. 

In the initial training session a complete re- 
' view of the Response Time Analysis Study was 
presented. ItS-origin, objectives, methodology and ^ 
potential ib>plications Jpr the Kansas City, Mis- 
souri, Poftc^ De|>artment and the law enforcement 
community were discussed. Emphasis waS given to 
the necessity of systematic and honest collection 
and recording of observations. 

A discussion was held on th6 rules and regula- 
tions of the department as they applied to civilian 
employees. T^iis included a review of the legal 
rights and obligat^ffffrtjf department memb^iii^ ' 
.Specific emphasis was given to the foflowing ad- 
, ministrative* guidelir^es regarding study personnel 
which was formulated by Response Time Analysis * ^ 
Study st^ff and then approved by the Commander 
of the Operations Bureau: 

1. Project staTf^hall treat survey data, inci- 
dental observations, and official depart- 
mental business as confidential unless re- 

* lease is authorized by the Project. Director. 

2. Survey data and other information inci- 
dental to project objectives will be provided 
to. the department for matters involving 
criminal investigations. 

3. Departmental persoryiel involved in proc- 
essing and having access to project data 

^ ♦ shall refrain from discussion of ^uch' infor- 
mation, regardless of how incidental, unless 
authorized to do so by the Project Director. 

4. Sv^'Orn personnel accompanied by project 
staff wilM^main anonymt)us to project re- 
ports. Information obtained from com- . 
municTations and field operations will be 
statistically tabulated in aggregate form for 
analytical purposes only. ^ 

5. Civilian study j)ersonnel are not permitted 

to aJlist sworn oWf^rs unless dire necessity * 
indicates such behavior is appropriate.^ 
However, study personnel are required to 

• ptovide assistance, i.e., physical or other 
reasonable actions, to sworn personnel 
up>on commancf, or when it is obvious and 
apparent that specific situations dictate such 

* actions. 

6. Survey data and other extraneous informa- 
tion obtained by project staff, inciderrtal 
observations, etc., will be exempt from de- 
partmental use for disciplinary purposes 
against sworn personnel except for those 
incidents involving criminal conduct. Proj- 



ERIC 



64 



ect employees 'are required to report both 
illegal actions and incidents of questionable 
legality to the Field Operations Supervisor. 
These guitWines were distributed in an Oper- 
ations^ Bureau Memorandum to all members of the 
Kansas City. Missouri, Police Department. It speci- 
fied a code of conduct <iistinguishing the project 
staff from other department members and insured 
•pledges of confidentiality would be honored: 

Observer orientation also included several 
hours of instruction regarding the op^aiions and 
organization of the police c^partment which in- 
cluded a tour of f>olice headqiluners, various spe- 
cialized units, the' countv jail, anS^he municiple 
and cnminal courts. Presentations were made on 
the organizational structure of the department, al- 
location of resources, operations of division sta- 
tions, jurisdictional areas delineating police re- 
sponsibilities, and the criminal justice jystcfn. This 
orientation pro\ided (Jkervers with a liasic under- 
standing of the organization being researched arid 
us relationship t-o other judicial svstems. ^ * 

One ot the major training components, which 
reqairj'cl pvef fortv hours of instruction, was 
police work itself. 1 his segment focused on police 
training and field procedures applicable to police 
patrol \i\ introduction of police work was pre- 
sented m a training film entitled "Law and Order" 
which depicted different aspects of police work in 
Kafisas Cit\, Missouri. Instruction was* given in 
self-defense, first aid, equipment usage, depart- 
ment procedures for handling specific incidents 
and on^scerie criminal investigations. Observers 
were also familiarized with* the uniform crime re- 
porting policv, department reporting forms, re- 
port writing procedures and beats targeted for 
field observations. 

"In the methods section the observers received 
an introducnon to research methodology ^^d field 
data collection techniques. Observers received in- 
struction in role playing and observational field 
procedures, which could be utilized in reducing- 
observer Siasand optimizing data collection v\ork- 
loads.' Additionally, discussions were hel# on ap- 
propriate attire and acceptable equipment which 
would offer the most unobtrusjve appearance for 
observers in police-citizenvencounters. ^ 

Approximately thirty percent of the training 
program focused on instrumentation develop- 
ment. Initially, a\eview of the observational in- 
strument was presented in the context of project 
objectives. Subsequent meetings examined instru- 
ment iterfis, operationalization of terms, refining 
skip patterns and ^simulaiing encounters to be 
coded. ExtensitKi^essions w erel^'conducted 
throughout the training and pretest periods in 
ordef to revieW and revise the field instruVnents 
and problems identified ^collection of data. To 
assist in clarifying sam«)f the more complex 
^^•rms and jnstrument^items, observers were di- 



ERIC 



vided into groups of three to research and r-ecom- 
mend concept definitions and syntax of the items 
for the observer survey form. 

Field work wis conducted ^throughout the 
.training aitd pretest periods. Observers' initially 
* rode in police cruisers in different parts bf the city 
for a general orientation of patrol and to become 
familiar with dispatch communications proce- 
dures, f>olicies, and communication jargon. They 
were instructed not to take notes, but simply to act 
as observers of mundane police activities. This al- 
lowed them to become familiar with poHce work 
and the officers without being burdened l)y data 

Jl^pU^ction. Once a degree of familiarity and cred- 
iBttky was established, some limited data were col- 
lected to Tjricnt ^oth the observer and the officer 
-lo what would become the observers normal work 
routine. After instruments were constructed and 
equipment acquired, each observer was accom- 
panied by the principal analyst and the operations 
arraU^st in charge of establishing the Observed 
component and field instruments for a complete 
tour of duty during which time measurement dif- 
ferences were monitored and field collection tech- 
niques disctxsscd 

Throughout the training period a continuing 
dialogue on the need for qualitative data and the 
honesf reporting of mistakes was encouraged. 
Meetings with the Kansas City, Missouri, Chief of 
Polite, the Response Time Analysis Study Project 
Director, several consultants and staff were held to 
emphasize the need for maintaining a high stand- 
ard of integrity in conducting field observations. 
This theme continued to be emphasized through- 
out tlie pretest and actual collection phases of the 
study. In order to document the extent to which 
observers conformed to project guidelines, how- 

^ever, adequate supervision needed to be provided 
and quality control checks implemented. < 

A sector sergeant from the Kansas City, Mis- 
souri, Police Department had been^eiected by the 
Project Director to supervise the observer compo- 
nent-following futile efforts to solicit a person who 
met the qualifications that had been define^l for 
the position. With nine years of street experinece, 
the rank of sergeant and thorough familiarity with 
police operations and rdepartmeo^ policy, it was 
reasoned that novice observers would find it ex- 
tremely difficult to fabricate data pertaining to re- 
sponse times and on-scene police activities. 

Training emphasis "of the Field Operations 
Supervisor was olaced on research methodology 
and the study objectives. He was familiarized with 
the.study Components, available literature pertain- 
ing' to previous research on response time anc^ 
other observer programs. Briefings on supervisory 
and obseW^ responsibilities, quality control sys- 
tems and department liaison were (;onducted with 
project corinytants and' study staff. Most of t^e 
training, how^fcr^ resulted from first-hand on- 



59 



C5 



60 



erJc 



the-job exposure in working on the o|>server selec- 
tion, training, and pretesting phases of the study. 

Once s^udy objectives had been -articulated 
and a methodology developed, department coop- 
, eration and support had to be secured. This re- 
quired those individuals- in the department most 
affected (or threatened) by the study to receive a 
thorough orientation of project, plans. Given the 
hierarchical structure of the police department, all 
levels of the organization had to be infprmed. 
Since the areas targeted for observation included 
all three divisions, copimanders, desk sergeants, 
sector sergeants' and patrol officers from each divi- 
sion were familiarized with the study. 

♦There were many problems which could be" 
anticipated in the coaduction^of this kind of re- 
search. For example, the tendency of police offi- 
cers to be suspicjous^could resuJt in observers^ 
being labeled' as spies. In addition there was some 
danger of information distortion as it filtered 
through the different organizational* levels of the 
department. Finally, observers once accepted in 
the field setting might be pressured to take a more 
active participation in police work, * * 

To minimize these and other concerns a re- 
tired IWnsas City. Missouri: police sergeant was 
hired as an assistant Field- Operations Supervisor 
to help maintain sound wJThking relationships be- 
tween project staff and operational personnel. H^e , 
was well qualified to act in the liaison capjrcity hav- 
ing served in the departJT)ent's operations division 
for nineteen \ears. During his tenurt' on the de- 
partment he had established a reputation of de- ^ 
pendabilitv and perlonal integrity. 

The assistant field si^pervisor^s primary duties 
included: 

1) Meeting with and orienting district officers 
to the project stnd discu^smg'witH them any 
problems resulting from the (3bservational 
program. ^ 

2) Familiarizing desk sergeants with the Re-' 
_ sponse Time Analysis Study and observer 

allo<:ation needs. 

3) Infofming pertinent command sf^ffiof 
study objectives,, project progress ahd po- * 
tential implications of researcli findings". 

4) Interviewing field sergeants to formulate 
observer procedures when riding with offi- 
cers and to ensure that police personnd 
weVe not discriminantly assigned due to ob- 
server deployment. 

5) Maiijtainirig a general knowledge about the 
organizational environment and receptivity 
tovaHoHS project related procedures. 

6) Monitoring personnel changes of district 
officers alined to the target areas and* 
fumiliarizingS^ly assigned personnel with 

, the study. 

The assistant Field Operations Supervisdr was • 
also required to submit a quarterly report to the 



Field Operations Sup^isor regarding feedback, 
from police officers iTOicating. any^problems en- 
countered as a result of observer data collection 
procedures ofHThe conduct of the observers 
themselves. 

The following quality control checks were es- 
, tablished and monitored during field data 
collection: 

. 1) All data submitted .to the Fi^ld Operations 
Supervisor had to be reviewed beforehand' 
an^ initialed by each' observer to insure its 
/ completeness and accuracy. 
« Police officer activity sheets were checkM 

against the observer's log of eligible inci- 
dents to insure *thak data wer« collected on 
% each call. 

3) Wrist watches, worn by the observers were 
' # synchronized every two weeks with the mas- 
ter reorder located in the communica- 
tions-dispatch center. Variations of time 

^differences were recorded in order to iden- 
tify faulty titne pieces: In addition periodic 
battery inspections of watch modules were 
made to a^iiaid malfunctions. / ' ^ 

4) Chronological logs were developed, to 
monitdr disciplinary, managerial, adminis- 
trrfti've/ rAearch, and -equipment problems. 
Information was scrutinized to identify if 
problems clustered jn specific areas, were 
randoVl^' disperse^d among observers or' 
were manifest to specific individuals. 

Once obserw" instruments had bem checked 
by the Fijeld 1u[ii nijjji jj »H j^iii Ih'h forwarded 
to the Quality Control Clerk who was stationed in 
thje downtown adminfilrative and analysis office.. 
The primary responsibHitf of this person was to 
catalogue field forms by precoded number and 
disseminate tliem to the appropriate collection 
^component supervisor. 

Now that observer dj4|| collection has been 
completed for over eighteenths, evidence indi- 
cates that the observer component experienced 
minimal problems. Exit interviews of observers be- 
' fore their departure substantiates earlier supervis- 
ory and consultajit reports regarding the quality of 
data collected. 

The "control effect'* discussed earlier appears 
to have diminished as a major limitation inherent 
in this observational research given t'he nunfrber of 
other factors' which also influenced the officers' 
performance while data were being collected. The 
"biased-viewpoii)t effect,** which signaled the 
danger of an observer becoming opopted, was 
checked *^ost totally from the outset by the ob- 
server deployment matrix "which required every 
observer to rotate beat-watches foHowine each 
week of data collection. Frequent meetings be- 
tween^e project*S'1iaison officer and police offi- 
'cers also helped reduce.the chancy of this prol)lem 
surfacing. - . 



66 



Two suggestions for EMS administrators are 
warranted following experiences obtained from 
the project just reviewed and consultation, with 
other j;i«^rchers and administrators. First, there 
is absolutely no reason to feel apologetic or defen- 
sive regarding, research^ possibilities within your 
own agenci.es. So little ir known about even the 
most elementary assumptions in urban emergency 
services that researchers are often themselves en> 
, barra§sed. If research contracts are negotiated jot 
grants developed, make sure provision' is made for 
. a special liaison consultant to evakiate the work 
being conducted for your own benefit. This per- 
son could be,recruhed locairy and would provide . 
^ valuable insight into interpretation of projecf fii^d- 
• ings and assessment of implications. Secondly, in 
order to respond to (officials ifi other administra- 
tive positions apd thf press, allow sufficient fund- 
ing to establish an implications committee which 
would explore consideration for new^ programs in 
the event that shallow results were reported. All 
too often researchers ^ave told public adminis-. 
trators what doesn't work without Suggesting con- 
structive alternatives. 



Developing Indicators 
of Program Effectiver\pss: 
A Process 

George L. Kelling 
Police Foundation 
Washington, D.C. 



62 



One of the requtr^ents for evaluating any programns that adequate measures of program effectiveness be devised. Although many 
programs have explicit outcome measures such ffmight be the case for a communication^system designed to decrease response time, it 
IS often necessary to devise ^'indicators" of effectiveness which are somewhat indirect or removed by some Jteps^ of inference from the 
effects jfctually intended. In this paper Kelling describes sdme of the problems in developing indicators and^ points to ways of 
maximiung their validity ' - * , 



ERLC 



The development of rndicators of program ef- 
fedtiveness is tricky and important business. * 

Perhaps the easie'kt way I can make this point 
IS to give some examples from policing, the area in 
which I do my own research and evaluations. I will 
present, and discuss three examples. I will then ) 
close, by describing the process which I feer is 
necessary to develop indicators for evaluations. 
One of thejproblems in policing about which 
there" h^s been recent concern, has been the 
problem of police brutality. Many programs 
have developed to deal with this problem. So- 
lutions include, citizen review boards, peer re- 
view panels, training, retraining, enhghtened 
disciplinary procedures, higher education, 
psychological counselling, etc. >^n indicator of 
police brutality is the number of complaints ^ 
filed against police officers. But, I lyiow of a 
city where police-citizen complaint centers ad- 
^vertise their location, where citizens are en- 
couraged to complain i-f they are not satisfied 
with services, where citizens* confplaints are 
processed rapidly and continuously,, and citi- 
zens are kept informed of the procedures and 
actions that the department takes. I k^ow of 
another city where citizens can't locate where 
they are to complain, are discouraged from 
compLuning, and are never itlformed of the 
*outc^e of their complaints. The first city has 
many complaints The second city has few. 
^The^point in this''e^mple is relatively simple./ 
The meaning of indicators is relative to their con- " 
text. With all deference to Gertrude Stein, **A 
complaint is not a complaint^ is not a complaint, is 
not a complaint." The same thing could be said of 
arrests, crime statistics* and a hast of other indi- 
cators. 

In this example it is clear that the activities of 
one organization have encouraged citizens to com- 
plain and madfl^he complaint process s© accessible 

68 



that it is not unlikely that they will accumulate 
many more complaints than the department which 
discourages complaints and makeS complaint loca- 
tions inaccessible. The number of complaints then, 
may not be an indicator of brutality, but rather an 
indicator of the success of a complaint processing 
system. It may also be an indicator of brutality,l)ut 
that may be extremely difficult to discern. 

Likewise, it would be possible that a police de- 
partment could, with great fanfare and publijky, 
embark on a program to reduce complainl»t 
Lhrough^training, recruitment, "discipline, etc. That 
program, attended by pu^icity, could call atten- 
tion to police behavior to persons who, in the pas^ 
sinl^ly gave it no attention ("What the hell, so 
police do thump once in a while"), thus modifying 
public expectation of behavior, which in turn 
would lead to tncfrttses in complaints. Those in- 
creases could ojj^ur in spite of the fact that officer 
behavior improves. It is conceivable then that an 
increase in complaints could mdicate a change in 
citizen expectatiof>s rather than bfficer behavior. 

Let me give yet another example injhis'^arek.. 
We know that there is a great gap* be*t ween.. a^^ua/ 
levels of crime and repcrr'ted crime.^How large that 
gap is, variei,from place to place ^nd from crime - 
to crime, b^t generally it is known that 50% ofi^ 
crime goes unreported* , 

Let us suppose that a def^rtment goes into a 
vigorous anticfime program which includes crime 
specific strategies, eliciting more information frotYi 
citizens, and improviifg police-citizen relations. Let ' 
us further suppose that in the process of conduct- 
^ing this program the police manage significantly to 
affect the public perception of th^ir effectiveness. 
It is not unlikely that njany citizens who have 
failed to report crimes because they have felt the 
police could not or would not do anything about it 
(reifiember that 50% of crirpes go unreported^ 
would start to report crimes which they would not , 



have in the ppt. If reportec^^^me^s-an indicaAr 
.of effectiveness, reported crime^b^ld go up ^nd 
thfL program could be viewed as a faifuVe. In fact,* 
the kicrease in rep^ed[ crime could m^aivthaS^the 
. departmen^^ad been successful in impro^Ag jtUb^ 
lie confidence in their performance. (Rapie is a 
good example. Rfape is seriojusly under reported. 
Rape viuirtis are, fnore^nd moi:e, beipg encour- 
aged., t^ report ra|Ks'and, in response to public 
pressures, police departments are im^ovin^ the 
quality of their handling of rape victims. It is con- 
ceivable that reported rapes Will Lncresfce but that 
does not mean that actual, i^pes have. They may , 
;have, may not have, or may hav^stayed tP^ same. 
Increase in reported rape statisncs can be itie re- 
^ult oT thanges in piftlic mores and improved 
police praceAM^s.) 

One mot^^tBmple. One of my colleagues, Mr. 
John Heaphy of the PoJice Fpundation, has been 
examining the issue of arrest productivity in police , 
departments.* (Arrests have been one of the histor- 
ical measures of police productivity). As he .went 
from department to depaYtmerxt lie found tre- 
mendous disparity m the numbers of arrests that 
'^ffictrs made which seemed to hav^ no relation- 
ship to r^orted crime or victimization levels. That 
led him to the second -question. **\Vhat does an ar- ^ 
rest m.ean?*' After months of immersing himself in 
that data, he has tcfent?fi«d the myriad o^jfaaors* 
that cat) be, and are,^related to arrests. (C^aniza-" 
tional factors, police' style facto^ reward -factors. 
neighborhood factors, actual crime factors defini-^ 
iiox} of crime* factors, court factors, etc>etc., etc.) 
The point is that thfe meaning of arrest, as with all 
^ndicators^ is tied into a variety of contextual is- 
ea*" Tc^ know what arrest, cortiplaint, criifie, 
orale, job satisfaction, etc. indicators mean, each 
must1?e seen within a-context. If the context is not 
understood, indicators can ue interpreted as mean-, 
mg«j|ie thmg whtn, in fact, tKeyJj^an something 
diametrically opposite. 

^ I know of a proposed evaluation of potfce 
services in whiA two principle indicators of 
police performance are response time (how ' 
0 long it takes Jor a police vehkle to respond to 
^ * a call for service), and police passing^. (the 
- number of limes a police car passes fT particu- 
lar point?. Thes^^ssump^^ns arg^^tM^if a police 
vehicle. tes^c^nd\rapidl^cj;miinals will be ap- ^ 
prehended or 'ab%^«l^<r and citizens* more^^T 
satisfi^d^ and that ira4>olice car passes a par- 
ticular point pften, criminals wiU b^ deler|^d . 
and citizens made to*feel fnore safe. It seems 
logical that both response -time and parsings 
. ' are- Indicators of police performance. Vet 
^ while that appears logical, there is no erhpiri- 
« cal evidence that either fast response time or ^ 
^ number of passes accomplishes anything, • 
Tlre^theories have been that rapid response 
tinie and pa^'ings can* lead to crime reduction, ap- 
^rphension, and citizen safety. But those have re- 



?) exag^r- 
^lYl^^alion of the excw- 



i. 



mairted, at least until very recently, unexamined 
th'eories, and unexan^ined assumpti^s. 

The development of these two ^dicators? of 
patrol effectiveness hjs been an interesting 
phenonijlion policing. Measuring patrol effec- 
tivenQj^ia>K.been a partitularly thorny problem in 
policing since ^ much important police Activity 
(gliblic service) has been inappropriately relegated 
to second jev^jl importance and <:rrme related ac-. 
tivities (crime related function<pccount for, at tbc 
most, 20% of police time) 
ated' importance. The co 

ment of the^criminallyj^ted^ activities and the 
**Kojak Syndrome" h^^Hkf|>b^h the police and 
students to the pofice OV^B^t r^afchers and* 
evaluatprs) to virtually'ijl^i^e public s^sviee func- 
t^ns and indicators in evaluation* oWhe •'f)olice. 
ColfJ>led with tfeat funotional bias, and the diffi- 
culty ^^itteasurin^ effectiveness, response time 
and passings ^(technicaU,y but expensively measura-J^ 
ble) ba^^ rnily on theory and logtCymk\e j^me \^ be 
substituted for actu^il goals. Police and evaluator 
have become willTng to assmme t+iat if respon 
time is low and passings often, /that ,thA, in itse 
lYidicaies success. In point of fact, it inAcates on 
that resijgnse is low^ and passings ofte^. Means 
^ave been substituted for goals. 

' One has to be careful not to be too harsh ' 
aboR this however. Measuring goal attainment can . 
be extraordinarily dtfficult. Oftentimes admigis- 
^tratiorrs Afli^^ to find process (means) i ndi colors, tb'^^ 
.Remonstrate their effectiveness sine*? they lack the 
funds, time and skills necessary for evaluation and, 
under pressure, they must do as best the)^^n. 
Ljjy^wise It is often the case that as-a result orlack , 
dCunds-or finel^ developed evaluation paetl\odol- 
ogy, evaluators simply have to settle for process 
indicafDfs*.When that the c^se and the theoretj- ' 
cal biases and the reliance Qflt meansjjrather thM ^ 
goals are made clear that is acceptably. The re- 
take occurs when admirfistrators and evaluat\)rs 
J:ome to confuse means ^nd goals. Short response 
time and many passings,* can be achieved, but- in 
achieving those end^^^ funds and creative ener-' 
gies are withdrawn from fhuling techniques which 
obtain tfc goals. 
^ v^rrests are oftentimes considered an indica- 
tion o§ police perforrrfinc^:T*he'theor^s that 
the more arrests an officer makes,' the more 
crime he is stopping, the more proficient he 
^s an Officer, arid the mor^ he, is contribj^tKig 
!^ the solution of a major social problem. 
Many people a^ree ,with that. Labeling- 
^ theorists arj^ue otherwise. They argue that ar- 
rests stigmitize an itidividual, can create a de- 
. viation amplification feedback loop, and make 
the problem worse for boih the indiyidOal ar- 
rested and society. 
' With this third example, I am trj^ng to make 
two'points that byth evaluators dt^d agctjcy profes- 
sfonkls must be extremely clear about. I) Program 



evaul#forfs ought to s<rive U) link theory and prac- 
tice. 2) There ai*e value apiibigwities in many of the 
gofcls of SQcia^l^q^jj^ms. ^ 

Re^rding5he latter point, the long range goal 
(value) r?|^arding crime in our sAciety seems to be 
fairly universally agreed upon, that is — tci work 
towards a situation where citizens can live in. their . 
homes and ir\ publicr pkices viiiti relatively liitle 
fear of i)eiVig».victim*ized. jBut the interiqi goals on 
he. way ta that, broad social goal are not always \^ 
agreed upon. For some, police are^to arrest offen- 
ders anti'present them for rapid processingfcfcoi 
' others, the piilice are to divert offenders* 
..^^fHatly ycjfUng offenders, fr6m the criminal jusrfce 





6^ 



ERIC 



system. Cost effectiveness* and cost/bej^fit models 
fend not to entphasize the function that values 
have in determinmg program goals or their meas- 
urement. AT complicated as jthe cost/benefit and 
cost-effective equations are, tlfey are only mean- 
mgful when placed in the context dt values.* 

• Dealing with'social problems involves delicate 
value and norm decisions. No doubl^it would be 
pos^ble to deaj with many problems more effec- 
tiveh if we were not restrained by values and ^ 
standards. Crime is an excellen^^xample. Concern 
for issQes like privae>, due process, and humane 
handling'of individuals restrains organizations as , 
they work towards their goals. The point is that 
agencv personnel have to context their goals 
withm the broA values of societv. Goals are, always 
values or contribute to values. I am now asserting 
this as more ff%n an abstract truism. It is an im- 
portlirft /act that politicians seem. often to be more 
aware of than we — as they, ignore our cost benefit 
calculations. 

Furtlre^ theories play an important function 
in our >\ork As evaluators and agency professios- 
als wopk together to establish goals and indicators 
of those goals it is ifnportant that they understand 
that afl social practices have, or at least oAght to 
flhave, explicit theoretical bases 'and that the evalua- 

^on 'of program outcomes should ^be a test 0 
theory. Whil^some of our evaluation activities ai^ 
mundane and tediQUS, others call for irs tt) retirrn 
rigorously to theory and attempt to understand 
the relationship of the program evaluated and the 

' theoretical bases of that pro^ram^ (explicit to the 
^ agency *6r not). A pro*eram is, or at least, ought fo 
be, the operationalization theory. A> critical 
point iti the. process of bringing together vafOes, 

..theories and' programs is that of establishing 
explicitly, program goals and indicators. True; this 
may be a struggle, and true too^'Jit may* result m 
incomplete explanations, but the more eyaluators 
and^ agency personnel struggle to establish the 
causal linkiges^ jpore relevant will be their* 
findings. Evaluators^^t best socialized in theory 
develppment, and J^rerating personnel, at 4)est 
.socialized inf theory application, have rare intellec- 
tual 6pportunities .when' tryijig, to define **What 



works?*', "How cfo we know it works?** and finally 
**Wby do^s it work?'*. ] 

How then Qught we develop, incjicators? It ' 
seems to me the process is at least a three fpld one. 

In the first place, re^archefi^;i«d evaluates 
havt to develop indicators of program effective- 
ness through the process* of ;;^total immersion in 
agency activities. Th^ cannot sit dQwn in several 
meetings with ageticy»^dministrators and expect to 
know agency or, professional goals, skills and prac- - 
tices, and the assujned linkages befween them. 
Agency aflministrators have a point of view but 
often they.are far removed from actual practice. 
Organization operatives l^ve a point of view^butf 
that too^as it^ limiiat^ions. What the evaluators 
must do to fully understand practice and goals 
goes beyond ^pnversation and interviewing. Let 
me^ive several^ examples. » . ' . 

We are noy^ beginning .to^develop plans tQ 
if it ts feasible Xo do an evaluation of foo/ patrol in 
New Jersey (New Jersey provides an interesting ' 
site as foot patrol operates in 28 cities and- is 
fielded by the state.) In the process* of developing 
the indtcators of foot patrol (I must confess that 
w#are also developing hypotheses, working rela- 
,tionships, examining data bastj/^^tc, but even if^ \ 
weren't doing the other tk 
to do the following to d^velo^a indicators) We.have: 

— Met"w>(^ top officials and administrator^ 
—Met witli'lTdd commanders. ' , , * 
— *Met with fteadJ of reco^s units, e^tc. (to see 
\{ data are available, how much it will Qfft to 
access, and how m\ich has to be generated 
, overhand abovl that which is available). 
— Met with a group of supervisors and admin- 
istrations to discuss what foot^pStrol ii {o\c' 
comphsh and how we can tdl if it is s^coin- 
plisjied. y " ^ . 

—Met wiJj^^group of patrol officers .tg dis- 
^ cuss wh^wot patrol is to accohiplish and 
how itfl can t^ll if it is accomplished, r 
— Walked foot patrol with patrol officep (so 
f4r-^taff.h4s waflftd a total of '15 shifts and • 
wilt probably walk a toral,of 15 more) in a 
variety of cities^ v * 

— Rode with foot patrol sergeants (so fStT a 
^ * total of 5 shifts). 

— Formed an advisory group of 2 toot-^patrol 
officers and 1 sergeant from each of the 5 
departments with which we plan to continQe 
our exploration. 
-T-Asked each of. the 5 department^ to form a 
^ Jliiiall task force to woi^ with. " 
V — Talked to citizens,^ including merchants, 
stneet people, and local residents about their 
views about foot patrol. * * < 

— Me't with state officials in two agencies to . 
discy^ witl\ them 'their p^Wtptions of th^ 
gjoals ofSie program. 



70 



1 



The purpose of al^these activities was to edu- 
cate ourselves to what foot patrol was,. what it was 
to accomplish, what it seems to accomplish, and to 
hypothesize about the causal linkages between 
means and ^^s. (Yes, this Jilerribly time consum- 
ing and ex^msive — I woulc) guess about S40JXK) 
worth of staff time and resources will go into de- 
^ \ eloping an appropriate design and indicators—^ 
' wo/ Vounting agemv staff time — and further it ma\ 
turrj^out that after all that tirjie and effort a major 
evaluation would be so difficult and expensive th^t 
onK a vcr> modest one woyld be Vorth the ui- 
vestrnent, perhaps one which would cost less than 
*the planning itself.) But we believe fhat onl\ in this 
immersion can we fully work with agencA people to 
estabhsh a, proper design and indicators. 

In Ki^as Cit\, we worked with a task fprce of 
patrol officers and supervisors for a vear to de- 
velop a design and indicators. That task force also 
recommended, and the KCPD approved, that tw(r 
police t)f fleers work full time- with the evaluatars* 
during the entir^length of the experiment (True, 
the functions of those^offlcers went bevond work- 
ing with us oji indicators ar.d included such things 
as monitoring the experiment, but throughout the 
experiment one of th*eir major tasks was totielp us 
understand what data meant. One of-them. Char- | 
he Brown, now works fulTtime for the Police 
Foundation and dai-lv works w/h non-police re- 
se^cKers and evaluators to help them understand 
what thev are seeing ) ^ 

Please understand that 1 am not saving that ali 
wisdom regarding w-hat data means rests with 
agepcv personnel I verv stronglv beljev.e t^at not 
-to be the case lhe% have their own biases, 
metliodologies, and vested interests which keeps 
them from fullv understanding what . thev see 

Instead, I am suggesting that it is irfi^)rtar>t to 
^ develop an interaction between person^ deepiv in- 
volved in resi^arch and witb th^se deepiv inj^l^d 
in practicev It js out of that mteraction ihanndi- 
cators develop. The development of indicators is 
not J reseafcn enterprise alone It is not^ praciibe 
enterprise alone. It is a process between carefully 
trained inquirers and carefglK trained practition- 
. ers. This process must be gone (hrongh at some 
point. If It is' not* gone through earn,* it will be - 
struggled through laterbetween antagonists saving^ 
^That's not what I do "That's not what I meant", 
or, "That's not what il means". If the pioce||^ is 
^ properly gone tbr()Ugh, the process results in a* 
contract between eval^uators and agency. That con- ■ 
tr^tt IS called a desfgn, developed by both agency- 
and evaluators. ' 

One last word on this. I am nat suggesting |his 
process as a way to do it I am suggesting tl\at«it is^ 
the oMy way to d(vU. (Lvcn if the^-evalu'atfjr^ aVe 
doing their first, fifth or twentieth cvaluatjifin in a 
particular agency). / ^ 

ERIC . ' 



, The secbnd'aspect of the development of indi- 
cators is that the researchers have to return to 
theoretical and practice literature* Most agency 
practitioners become fairly removed from thejit- 
erature of their 6wn field. Most have difficulty 
keeping up with current research, let alone maiir- 
taining their interest in theorv development, 
causal linkages, etc. But that is an important task 
for researchers and one foe which thev are exten- 
sively trained. The cievelopment of indicator^ is 
not a mechanical job that can be done independ- 
ently of the intellectual traditions of a field. As an 
example, my own feeling is that thyse who started 
to use resporrte time and passings as indicators of 
patfpl effectiveness made two mistakes One was 
that they confused means with goals. The second 
w^s that thev simpiv did not undeYstand the histor- 
ical traditions of the pc^lice Response jinje and 
passings are almost complfPely i^l^tt^io theTritjie 
related functn^is of the pojice. (Proponents of 
these as indicators may. argue that response time 
has broader application but if you read their rnate- 

^ rials, anv other functions T)T response time are re- 
legated to a distant, distant sound ) I he problem is 

- that such an emphasis ignores manv ()t the impor- 
tant historical traditions in policing This problem 
of research and evaluations lacking context has 
been a spectal problem 'in policing where fev^ 
practitioners v\rite, and universities are only start- 
ing to begin fX) do. research in policing (For all 
practnal purposes, -wo research exists on police 
techniques prior to 1962). Thus researchers. carry 
th© responsibility of irving to ground their re- 
search (evaluation) ^tfieorv. That may be diffKult 
(the Police Foundation accomplishes this partly by 
having an Evaluation Advisorv Group, all of whom 
are r^pected academics, w ftose purpose is to force 
evaluations to go through the process of trying th 
tie their work to historical trends and establish the 
causal and theoretical relationship between find- 
ings and practice), and often is exceedingly painful 
l^ut it IS absolutely necessary for a field-of practice. 
And thirdly, the task of the eialifttor as he 
developed indicators is to help the practitij^ner 
context their experiences. In the first point, I em- 
phasized the need for the evalu'ator. to* immerse 
hirDself ifi the agency and learn from the agency. 
Now I am erapbasizing the other side of this. It is 
the otffigation cjf the evaluator to bring to the 
operating agency -the contexts and theoretical trad- 
ition^ discussed ab9ve or the evaluator does not 

'ju^t bring the agency technical skills or, speak to 
the .agency on its terms, but rather^brings a critical 
capacity both as ^ result of his/her trairting and the 

• present stat^)f the literature. He/she conveys to 

"^iKc agency specific research findings ^nd critical 
atialyips' c^. the agencies' program. The evaluator 
brings these, tradition^ in tfle form of^^ constant 
probing and (Questioning. He/she, by challenging, 
even irreverently, the present beliefs, can (4>ntrib- 
ute to' the Teaming of the agency. Again, the re- 



searcher is an inquirer. The evaluaior has to force 
' ^ihe pradiiiioMp- to' review his ideas in the context ' 
, ^pnheory, ana history. 

^ ' « • * * 

^ .. Cpncluslon 1* 

* .4 liave presented thj? developmefc of indi- 
cktowas a process which occurs between research- 

. #rs Ind program professionals, is a process ' 

^ vfhichj'feel is indispensable in good research and 
evaluatitjn^.. It is time (^jj^suming and' expensive ' 
for both agency and.'researcher. It c^Us fqr rigor- 
ous scholarship on the part of th^ researcher both 
in his/her background work and field wor£. It calls 
for a real knd Extensive commitment tout of pro- 
gram ptofessripnals, I suppose it is likV milking a 

r camel. It \s difficult. It is painful. Y^u wi|l get 
kicked, spit on, anibjruised. It takes a long time. 
People-will think yfeu crazy. But if you pui your 
mind to it, r.eally cbr^centrjile, and MEAN IT, 
REALLY M^AN fT, you will be abje'to milk a 

^ camel. . . . , - ^ 




t Measuring the Monetary Value 
of LifesKving Programli* . 

Jan Paul Acton 
• , EconomisU 

The Rand Corporation 

Santa Monica, California ' ' 



It I'Vrv often happtns in evaluating some prografn 01^ other intervention that the iss^s uUtmately boil doum to a ^Latter of econondcs. ) 
" Speafical(y the question which must be answered is wheiker m view of Us costs an interventxon is worth doing There are two distinct J 
proble*ms whicfi have come to be knffwn as^cost-effectrveness and benefit-cost analysts. Jan ^cton, %n the two papers \hich follq^ 
discusses these two types of analysis and trxei to^qw how the more fundamental problem bienefit-^cost rant would be'^ approached m 



fthe context of provision of emergency medical jervues 

I. Introduction ' > . 

' • • A multi4ude of public invesnncBt^ and regula- 
t6ry tiecisions which have §om^ effect o^n mortality 
mo^^bidity rates are made by [egislahjres,' ad- 
ministrative agencies, andr tHe courX3 every year 
Typically. as»in the case 'of highway safety en- 
gineering, ah^ choice Uiiich cdhfronts the public 
decision-maker is betWeen reduced mortality rates 
and* hence ♦longer life expectancy for some group 
^^^^yd moi:e resources available for^ other purposes 
fc g., additional miles of highway cor^strucuqn or a 
reduction in^taxes). A decisioJn to re<^ife some- 



I ' ■ ■ • ■ 

d6llar items, so thai the evaluation of such a pro- 
gram win require the decision-maker up^plac^^ 
' doHar value on saffty» at least in an impiicJL sense, 
(iven 111 the school bus safety example, it is not 
appropriate to phrase the safety evaluation ques- 
tion in ternfis of educational quality units if chang- 
ing school- taxes is a. viable option.) 

* * How are we to go about placiiig a doUar value 
on the health and safety effect?W a public pr^ 
gram? The method which is in accord witb the 
theoretical postulates of welfare econ'ongjjc^ is tq 
measuje benefit ^is the sutn of all affected ^inclf- 



thing, other than the minimum technologicalh;^ viduals' willingness to pay for the pr^pjcwed pro- 



feawble-mprtality rate reflects in effect a judgment * 
^b^^Wprtal^^ty (or safety) is not tb be given lexical^ 
, priority, in puJbTic* derftton^ over aJl other com- 
mcklities whwch- money c*n buy—judgment which 
is certaijily reak>nable and in accord with everyday / 
decisions made by households. If mortality is» no/ 
to be given lexical priority^, some other standard or 
proOedure 4s heeded to determine which proiects * 
are worthwhWe. In paVHcular^ a procedi^P^- is 
needed for njeasuring the penefits of such' pro- 
^'^"'^graras in units wtiich can be readily cOmparfcd with , 
the costs^' 

' lO some con'strkined decision ^situations, the 
cb«tS|cah be' expressed in Units oT, an identifjed 
commodity: for example, 3 school bbard may be 
facecj with the decision of boW mtfcb of its budget 
to spend on school bus safety, knowing that every , 
addiffbnal dollar spent on bus monitors an|i driv-' 



gram,* We.cart ima^ne each household being in- 
formed of* the potential effect of the p.roposed 
program on Tts members* own safety a i id the safety 
of all those they cafe abgut, and then sending a 
ballot to the appropriate agency whfch indicates 
the maximum amount they wouldjk willing to p^y 
.to have the program enacted. T^eir resix>nse will 
reflect l/ie risk aversion, th^r anxiety of dying 
from the particular cause ,<fnich is to be modified 
by the proM*am, their financial circumstances, ^nd 
the objectl^ reduction in risk to them t^ndLVieir. 
fri)ends. If th^ aggregate willingn^s to pay Exceeds 
,the exists of the pfogram, then the progcalm is 
worthwhile in the sense that^Ver^one could be 
madcb^ter off by its adoptipn: It is possible 
(though* probably not aihiinistrativdy practicable) 
to charge' e^ch beneficiary less than it is worth to 
him ana still cover the program costs. This "poten-> 
ert' salaries 'will reduce the quality of education by Ji^l Paretb jmprovement" criterion is the formal 

antf . - ^ theoretical justification tor wcost-benefit analysis. 



a c^ertain amount. The choice ^between safety 
the qualiiy of education .is easily understood, and 
could be assessed directly according to thepref^- 
ences of the pul^lic as repilpsertied by, the school 
b6ard. More^ generally, money allocated to safety 
will be^keh from a fungible source jvhich haJ 
many alternative uses. In such cj^s, there is no 
eood alternative to measuring the cost of $afet)|^ 



and^k applies as well t9 evaluati6n of programs to 
reduce mortality or morbidity as to more tradi- 
tional subjects liie irrigation evaluation.* 

» This method, then would define the benefit of 
a program 'wl^ich can be c^xpected to save ten 
"statistical*' lives out of a population of IJ[)0,OOp a^ 
the total value the J 00.000 meniJ>crs af this popu- 



ERIC 



73 



'^^-Jition place pi^^iaving^ the probability of each indi- 
^ viduafti. d€fSni reduced by one in 10,000. An al- 
ternativeMnethod. and* the one which is actually 
used iq almost all evaluations of public health and 
'safety programs, is to attempt to 'actually place a 
money value on, the lives that the program wauld 
be expected jto save if it were adopted. In the^ 
exan^ple above, .the "J>enerit" of the program 
^would be lOV, where V represents' the average 
"value/af^a human life." The method frequently 
ased in practice for the hcroif job of assessing V is 
to calculate the sorcalled **liYelihc>od^ measure* — 
the present value dT lifetime earni^igs fer a repre- 
sentative individual. The' normative viewpoint 
w'hich%pparentl\ moiivaie^s this approach is either 
tKat (1) people are pro[>^ly thought of as the chat- 



68 



ERJC 



t8 of (he state, atid the loss of af life has a cost to 
the state tomparatle to thccost of ^ slave's death 
^ to his owner;- or (2) the proper objective of public/"^ 
' policv is to maximize Cross National Product.^ \ 

A third procedure for benefit valuatib^ has 
not been emplo\ed m th^ past, by^** potentijjl) 
\aluableT Since \aVious puShc agencies^ and legisla- 
tures have been confronted with man\ decisions 
which in effect invoUe tradeoffs %>et ween dollaj"s 
and mdrialii\ rates, there js considerable prece- 
dent for current decisions of a similar sort. Anahz- 
mg these precedents could help to increase^the 
consistencv of government tjecisl^i-making. 

Before proceeding to discuss^ these basic ap- 
proaches to mAsunng the benefit of jafeti^ ^ 
enhancing program* m more delail, u is'usef'uJ to , 
indicate ^(tme of the seemmgU related issues 
which, from a normative viewpoint, are in fact 
quite different. "First, we are not doling with the 
question of how *fnuch the government SM^ld 
spend to attempt loiisaAe the life of an ide^ified , , 
individual (the coal miner trapped in a cave-in or 
the child m^idnev failure) whQ is certain to dje in 
the absence of government miervennon. This is a 
ver\ difficult issue because of, among other things, 
the s\mbolic irrjjK^riance of maintaining a public, 
commitment to preserve lifc^ which according to 
. ^Calabresj and others is properK \ iewefi^ differently * 
from'the safet\ investment issye*** Second, we are ^ 
not' attempting to determine the appropriate 
^moupt of compensation or punitive damages ) 
award (to. either (he i*ndividual or his survi\i>rs) foiVl 
lojurv or deaih. While this issue is related to ours, 
in that cc^urt settlem^ts in such ytses may w^ll in- - 
fluence the amount which >p^ivate firms and , 
JiQUseholds invest in safexy,^the relationship is 
complicated b\ equity consideratioCTs'and z 
number^of other considerations — inclucKng the ,dc^ 
sire to .establish correct incentives for people 
vyhose actiqAs infiuence monahty rates.^ Third, we^ 
are not attempting to analyze-.^he demand for life 
instirancc, since thisj^ctefhiined by anjndividur 
al's bequest motive anah^*t by the value he plac^cs 
on his own safety.* 



* The remainder of the paper considers each of 
the procedures for benefit valuation mentioned 
abovt, but in reverst order. A final section sum- 
marizes the principle arguments and makes several 
recommendations fo^g>olicy analysts. 

II. Pomicar Precedent 

. The logfical first place tp 16Qk for a source of 
^^pdards for evaluating public programs which 
Ironance health or safety is to the political process. 
If decision^ regarding these programs tend to re-j 
fleet 4 consistent set of values, then these values 
have a claim to }>9Htical legitimacy and should be 
brought to li^ht. 

First, what does it mean for these decisions 
be i;itemally consistent? Investment and regula- 
tory proposals, differ in many dimensions, Ihchid- 
ing'the identity of the target populatioh, the cause 
of death or disability which is to be curtailed,' the' 
nature and ni^gnitude of the projected effect',' 
various side effect?, and cost. To focus on the 
plicit vaLuaiions wMi^h Mich decisions okke on im- 
proved mortality r^||, ^o assumptions are use- 
ful: (1) Linearity: A pfojgram which \educes the 
probability of death by two in 10O(K^«a" each 
member of a speciTied grouo is worth twi 
fliuch as a' program which cSises only^^a^ne 
1000 reduction; and (2) Iindiffere^c 
particiMar source of deathCi^M^Hf to be curtail/ 
bv a program does not influence the program 



Id be 



^ \alue — all that counts is the number and^perhlps 
characteristics of lives saved If th^se assumpttMS 
^^e accepted^ then a-* consistent prCKedurc for a? 
sessing the benefit of programs is. to value eacft of 
them by the number of lives which it is predicted 
will be saipd, n>oltiplied fey 5ome',number repre- 
s^ting what is often called the avifl^g'e "value of 
life" for the pccigram's target pcJpilation.^® Prece- 
dent decisions can be analyzed to ascertain 
whether thty refiect a consistently applied set of 
life valcves 



For any nun^r of reasons it comes a^o sur- 
prise that public\)rogram choices V> noi reflect 
the type of consislency'defineci abpve. One study 
which examined a number|of lifesaving progrsyns 
found implick values of life which ranged from a f 
{^w rfhousaRd .dollai's (in highway saifety design} to 
over a million dollars Iin an ejection system for an 
air for^ bomber). To some extent thi^ variability 
m^j^refiect deviations' from one bz.bmh df the 
sifnplifying assumptions stated above. Fbr 'examV . 
.pie, a hign*er and more^^expcnsiye stapdard- of 
safety for airplarics vis-^-vis highways may be jus- 
trfied by the ^gUlnent that the iljpeat of aN:rash , 
seems to^prbdifcc greater anxiety in air passengers 
- than in autQig^sengers, eveq though the objective 
probabilities of 4^^^h/mile are lower for the 
former group— this may geneVatc a dispro'pDrtion- 
ate demand for air iaTcty. (In this vcip^ne could, 
' also point to the disproportionate concern about * 



74' 



dca^h by shark bile or being murdered by a 
stranger'.) . ^ 

InevitabK^ however, much of the variability is 
the result of decentraft^pd and varied decisioo^ 
making processes, special pohtical iftterests, and 
ignorance. An^lvzihg past decisions for precedents 
in defming the appropriate value of safety and 
health pnigrams would be useful to theextent that 
a helped dispel this ignorance and vield under- 
standing of' the imphcatictis of consistentv foj de- 
cisions concerning programs under current con- 
sideration. 

UliimaieK. the studv of precedent decisions* 
does not \ield an absolute standard b\ which to 
n>easu re* benefits of potential .programs — it does 
offer a ct^nijngent standard which mav be useful 
If established program X is generalK recogniz^^^ 
a^ VNorthwhile. the proposed program Y offers a 
comparable increase m life e\pectanc\ /dollar ex- 
pendedt then there is a good argument for adopt- 
ing progja^n V In the absence of a consistent set 
^)f values generated fh the politKal decisi^rn proc- 
ess. h()wev<*r. there rem^ains a pressing need for 
b<.ne<lt allies calculated on the basis of more f|p- 
damental nt)rmati\e considerations Ir is this need 
which, nghtiv or otherwise, is current!) bjjmg^ 
filleci b\ the "iivciiliood" procedure for life valua- 
tion-.^^ ~ 

III. Livelihood-Saving Measures of Value 

Li\elih()od-sa\mg is the most comrrionh used 
formal method for assessing the value of reducing 
mortality, and has been used as sirth for over 50 
vears This measure is based on the^et pre^nt 
value of changes in the person's earnings stream ^ 
Bv this criterion, if the expected live!ihfX)d-s^vings 
associated ^v\ith a project^ exceed the costs of the 
•project. It IS vkorth undertaking/* otherwise the 
project IS not worthwhI^e Despite considerable 
disqus«ton dpd use of hvelibood-saving measures 
in the* Iitej'ature. theV^ d<^^ps not appear a clear * 
Statement of whv it might be desirable to emplov 

.such a criterion for fu-hding public programs In 
particular^ there is no reason to believe ^ prion 
th^t changes m earnirfgs streams bear anv cljrect 

, relationship to what socretv va^^s in health or 
safetv progra^m outputs 

The livelih(x>cKsaving approach naav*^have re- 
ceived the attention i^'has bc'cause it is r<?lativcl\^ 
ea^v tT> appiv and gives the ifnpression of- provid- 
ing in unatnbrguous nucnerical answer It is easy 
because the analvst can* consult* a taj^le to deter- 
mine ih^livelihtHjd at different age^, identified by 
sex-'^^ce, and education. *J^T^^^mpression of 
nUmery^^ precision is more apparent thao reaL 
however. A.number of wmportant assumptions 
underlie the tables, an<? unless^the decisiou-maker 
is'tons'cibus of their meanings he mav be uncon- 

^'sciousK supporting a social judgment that he 
would reject |f He faced it explicitly 



A. Intrinsic Shortqcm^ings of Livelihood 
Approaches 

The J]||fijpf objection to a livelihood evaluation 
is that it lacjcs a satisfactory normative justification. 
It IS possible to infer from the way this approach is 
discussed in tHe literature that it is supposed to be 
justified by analog)' to the economic procedure for 
valuing a machine or other piece of capital equip- 
ment^'lf a machine is accidentally destroyecL^e 
resulting economic loss is equal to either^^ihe 
cost of replacing the machine, or, (2) the present 
value of the services which the machine would have 
provided if it had been saved — v\hichever is less. If 
the markei^for such machines is competitive, then 
measures (I) and (2) are equak and both valid. 
Furthermore, the value of the machines' services is 
equal to the implicit or exjjjjcit rental price of the 
mach ine. People can be v'lewed as embodying 
"^uman capital," the services of which are rented 
in the labor market or used in home "production" 
(housecleaning, child care, etc.) The rental rate 
(wage rate) for Jabor services will under some as- 
sumptions reflect the v^lue of such service? i,n 
production // vse ^re to accept the notion tt)at the 
sf)Cial \alne of a Iff is equal to the valuoi of the 
iabor se|^vices the person provides, then the pres- 
ent value of tht person's expected earnings (in- 
cluding Mmpluit * earnings from home produc- 
tion) IS the appropriate measur^p of this value 

P^^d^le are not machines., hov^ever. If w.e ac- 
cept the view that prciduction is hot an end m itself 
for people, but rather a neccssapv intermediate 
step whuh allovvs us fo enjov the fruits of produc- 
tion, then the 'human capital" approach is clearPv'^ 
inappropriate. Increases in safetv and life expec- 
tancv help to ensure the continuation of an indi- 
vKJual's abilitv to enjov tfie^pleasures of his life and 
the plitasure v\hich his familv and friends derive 
from a continuation of their relationship w ith him, 
and It IS the value of prolonging this enjovfnent 
uhich should be a^sessed^in measuring the benefit 
of public programs vhich affect ^ety' While this 
hedonistic view would not be appropriate in a slave 
society (at least^ from ihe owner s view point) or in a 
societv^dedicated solelv tonncreasing the Gross Na-* 
^tional Product, it seems entire.iy appropriate in an 
individualistic societv ^vhere the government is/ 
viewtd as serving the public rat,her than vice 
versa. ^ ^ 

Thehivebhood procedure 'mifjfht still be ac- 
cepied in practice if it could be^demonstrated that 
It provides a reasonable approximation to a fne^v 
ure v^'hicl^doe| hay^ conc'eptual validity— or even 
to^ our intuitive ri^ions of what equitable* policy 
requires. For some judgments at least, th^s type of 
justification is clearly lacking. For example, it is an 
inescapable conclusion of this criterion that scKiety 
should spend no mon^ on programs that extend 
the livefof fatallv ill child ren because the'^ro* 
grams would .produce no ch3nge irf th^ir future 



earnings, hurthermore, most persons would not 
Agree that it is as important ^to save onc"worker 
earning $10,000 per year*as it is to save two work- 
ers with .similar f)ersonal and family characterise 
tics, but each earning $5,000 per year. Jt is even 
more doubtful tha^mpst decision-makers would 
want to save men anW^omen in proportions that 
depend on their ea'fnings — even if a*homemaker s 
services are valued at the wage*s of a domestic 
worker rather tharKat z^^- For instance, the 
livelihood-saving clfiT^i^lMM^^ presented below 
shows that a white man ii^h^50*s is valued more 
highly than a white womanli^er 2(i's. If we were 
using livelihood-saving as the measUre of value^for 
government health programs, this means we would 
rather approve program^ that save 55-year-old 
men than programs saving the same number of ^ 
25-year-old women. It also indicates that it is \ 
worth about^ice as much to sa'. e one 25-year-old 
man as to save one 25-year-old woman. 

It IS doilbtful that these magnitudes reflect the 
rate at which most people would want public 
lifesaving and morbidity-saving resources allo- 
cated^ There is tittle direct evidence on this point 
about societal prefere/ices, b^t what exists 
explicitiv contradicts this implication of the liveli- 
hood approach. In Acton, ^® 91 persons were asked 
hypothetical questions about which person they 
would like to see saved if two seriously injured ^ 
men arrived at any emergency ward and there 
were resources available to save only-one erf 
ihem.^* The respondents had to choose between 
several different^ pairs of a^es. , Approximately 
one-third (31) of the respondents^alwavs chose to 
save ihe younger person; 39 expressed a prefer- 
ence that was single-peaked in age (peaits gener- 
ally occurre^l betvCeen 29*^nd 30 years of age as 
does the human xapital curve); and 8 were indif- 
ferent to all age pairs. (The r^hnainder were mul- 
tipeaked or inconsistent rankings.) Thus, sonje- 
w'hat Jes5 than half the respondents exphessed a 
desire to save lives idei^tifled by age that corre- 
^)nds to the shape of the livelihood curve. 

The livelihood measure assigns a higher value 
to men. than to women at almost all ages, but thii . 
sample rejected ^ch a rankirig when askcfd to 
select a'man or a woman of identified af^s in the 
eraergency*ward question above.- The majority ofc 
persons (53)6elect^d only on the basis of age and 
matched the same ranking they ha^ expressed 
when selecting bet^j|ieen /wo men.. Nine re- 
spondeiits always selected the man over th<e 
woman, and nine always selected the woman aVer •* 
the man. In one question, the respondents were 
askcd^o cl^se betM^een a SO-year-old man and a 
'30-year-old woman. Thirty'-sev'en chose the man, 
43 cfiose the woman, and H expressed indiffer-* 
ence. v . ^ ' 

We are not aw^r^ of any otKcr systematic em- 
pirical evidence about people's preferences for ft^r,- 
ing lives identified by age or byjsex. However, th'is | 



empirical evidence, aloii|f%I|h casual observation 
of attitudes for public programs^ suggests that a 
majority of people would at le4st reject the relative 
value of saving men and women that is implied by 
the sfmple livelfhood method. In the provision of 
public services, where objcctives^ay include al- 
lowance for factors su^ as income redlstribiXion, 
and externalities such as the numbers of depend- 
ents that will be orphaned, the social evaluation 
may even vary inversely within measures df^ liveli- 
hood involved! ^ ^ 

Even^if we were satisfiec^hat the livelihood 
procedure formed a coi^eptually sound, basis for 
public program evaluation, an important practical 
issue remains to be resolved; Market earnings in 
^some cases do not equal the productivity of an in- 
dividual's labor. 

B. The Isiue of Eirnings ys. Productivity 

A person "s earnings may differ significantly 
from his prodjuctivity for a number of reasons. For^ 
instance, workers in a strong union may ^arn «Hr- 
siderably more than workers^oing identical?^ 
nonunionized work. Some groups may face earn- 
ings discrimination because of their race, ethnicity, 
or sex. Some people (e,g., |>eople with job senior- 
ity) may be receiving an income substantially above 
their productivity. The livelihood measure is blind 
to these distortions. It merely says to add up the 
earnings of people who may be affected by differ- 
ent programs, and select thrones that save -the 
most earnings. Since disease^s t^ical^j^ do not af- 
fect different Vacial, sexual, or socioeconomic 
groups uniformly, a criterion that depends on 
earning differences among these' groups will 
necessarily slant put^ programs in particular di- 
rections. If some diseljises are found more often in , 
people with higher learnings, the rule says to de- 
^'ote yolbr attention afhd resources to these diseases. 

The un'desirabl^^ature of this criterion is 
brought home acutely when we consider the impli- 
cations /or the treatment of women- (although it 
applies in less extreme form to-'any case where 
wages do not reflect productivity). The rtatibnal 
product accounts do not influde the homemaker 
services of women if they are not purchased*; but 
to fnclude them from a measure of proj^c^ benefit 
will seriously undervalue programs that affect 
w6men. The most comnjon procedui*e is to value 




* also been used (see, for e)^ample, f eldstein 

Using theVarninp of a domestic servant is 
oniy partly satisfactory, however. In the first place, 
the honieniaker may be providir)g quantity or qual- 
ity'^ofseA^ices that are' hot available in the market. 
For ins^nce, when we observe a woman with ad- 
vanced education wjio cbiildtakt a job paying two 
or three times aJ|omestic servant's intome, sh^ 
may he staying home' to ratsc^Ker imall child be- 



cause she f^els the first few^vears are important 
and because she does not feel, she coUld hire such 
high^qualitv nurturing [or her child. Under the 
circumstan(;'es, Using ihe domestic serva^jt^s earn- 
ings will understate the,\alue of this \\;pman*s » 

'home activities, a^ she seeslkem. In such circani- 
siancesr. we could argue thaler services at home 

' should be valued at least as highK as the highest 
salarv the woman could earn However. v\e prob- 
abK do not realK wish to ado^t the implications of 
su(h reasoning After all„man\ people accept ji|bs 
at a salarv \^ss than the maximum the\ could 
(ommand in the market Thev ma\ do this in 
order tc; have better working co^iditiQns or in 
order to pursue a particiJar tvpe/of work In thc- 
e\treri%. the implication (jf this /foregone oppor- 
tunity argument is that we should value evervone's 
servues — men's and women's — at the highest pos- 
sible wage thev could earn Ign(<ring the readjust- 
ment this would cause m the general wage scale, 
such a recalculation woulci/ raise the implicit earn- 
ings of sowietv consuieral^fv 

A second objection to the standard treatment* 
of home productifm is tK^t it is assv metric with re- 
spect to sex Mwr all. ^omen are n{)t the onlv 
workers around lh<:^^ho>r)e '\l6rgan et al ^ * ^fcd 
Walker and Ct^uger surveved people 'd\ aui the 
' hours thev sperld v\orkmg around the house Thev 
found that men spemi between iibout one-eighth 
and one-third as fnuch tifne as do women, depend- 
ing i)h the emplovment status of the woman, and 
the ages and familv si/es involved If we^are mi- 
puting ^ value U) individuals for their home pro- v 
duction.' then it seems appropriate'to add an cle- 
ment to the man's livelihood calculation 

The third objection to the standard treatment 
of home production lies in the treatrfient of older 
wc)men. especullv over 65 v^ears of'age Rice ancl'^ 
C(K)per attributed a full d<>m^''»n^ worker's in- 
come to nonemploved women over 63. causing 
thieir livelihood to exceed significajitlv that of a 
man 'over 6.5 One could speculate that women 
over 65 start tt) slow dcn^n in their household ac- 
tivities, but it IS difficult ic)' find data Walker and 
GaiTger did not survev older women 'U'e 
aanalvzed the results cxf the Productive Americans 
Survev (partiallv reported in Morgan et al Fhe 
number of observations is relativelv small in the 
over-65 age group, but there appears to be a 
downturn in average numbt-r of hours worked at 
home bv niromen and an incr^-ase in the hours 
worked bv men VVonten's hours dec lined about 19 
percent in the over-65 age group and men's hours 
ir\(t^rea^ed about 17 percc-nt^his leaves v>>>men 
over 6i5 reporting abciur'^?^ hours of housevsoij^ 
per week aild men reporting aboul 6'/2 hours 
Yhese figures may represent an overstatement of 
true contribution if ptcxiuctivitv falls significantly 
in this age group^ Furthcpmore. there mav be 
smnp reporting error if^e respondents have httle 



eke tci do and therefore claim that most of their 
times gt)es to housekeeping. ^ 

Since there are no cc^mpelling theoretical ar- 
guments for one rule .over atiDther iiT accounting 
for household piroduction, livehhood tables can be 
.'generated under a vanetv of assumptions aboUt 
the value of women's and men's contributions.^' 
. Fhese calculations show significant variation in the 
livelihood, especially in the upper ranges, depend-, 
mg on;the assumptn^ns emploved^Forvillustra- 
tions. Figs. 1 and 2 plot the frC'elifiood at different 
ages for a four-w^v breakdown of sex and i*ace 
under t wo f)f the assumjptions possible -f or treating 
hpme production. Ihe assumptions behind the 
calculations. 'are discussed in more detail ii} Ac- 
lon,^; but brieflv. Fig. 1 (Assumption 1-1) assigns 
a value of $4800 for the domestic w^)rk of non- 
v^orking women ^^^^igi^'^ - (Assumption 3-3) as- 
signs a variable amount to women's hbmemaker 
function (depending on their employniern status) 
and a unifc^rm amouiH to nien After 64 vears of 
age, women's contribution is reduced (19 perceiu), 
to reflect a drop in household activities, and men's 
IS increased (17 pc;rcent) A 4 percent net discount 
^ rate is used for both figures ^ 

We do not intehd t?) focus on the nature of 
livelihood at different points in life or to concen- 
trate on differences among races and se^ces (al- 
though thev are alreadv c|uite substantial) These 
plots, however. servc'Mo emphasi/e the substantial 
vanahilitv due to alteyiwtive assumptions about the 
valuationOf household activities and the substan- 
tial impact this has on the relative atid absolute 
amount assigned to women bv this criterion. The 
effect of these alternauve assumptions is s'lgnifi- 
cant al all ago — but it ts especiallv iioCewDrthv in 
the over-65 age range where a substantial amount 
of mortalitvt anci morbiditv is involved from such 
prominent ailments as heart and circulatorv dis- 
eases and cane er 

The plots in Figs 1 and 2 show a close similar- 
itv between the livelihood for wHite females (WF) 
and all other females (AOF) This is due to the rel- 
ativelv low work rates oT wcimen. combined with 
the assumption that all non^orking women are as- 
signed the same value of householc^scrvices re- 
gardless of race I he differences between white 
male's (\VM) and all other m^les (AQM) is about 
the same under the two assumptions and measures 
about $60,000 higher for white m<^ in their Jate 
20's than non^hite men in the same age The dif- 
ference between sexc\s is dramatic — with the liveli- 
hood of white males ^t its peak about 2 '/a times the 
level of white females at its pea|c under Assump- 
tion 1-1 When the household production of woi k- 
ing men arid women is given an imputed value 
(Ass^imption.3~3). the differences betv^een the 
sexes narrow considerablv At its peak, white 
men's livelihood is only 1 7, times that of white 
women The male. female raiio is even closer for 
nonVN'hites • * . 



ERIC 



77 



r 



The other major effect of the different assump- 
tions comes in the crossover between male and* 
female livelihood in the upper age brackets. Under 
Assumption f-l. female livelihood crosses male 
between 50 and 60 \ ears of age — due both to the 
lower life expectant \ of men and tKe fact that 
woi^en are assigned a \ alue ()f household prjoduc- 
lion while the genefaJI) retired men are not. Con- 
secjUenth. o\er 65 vears of age. male livelihood 
falls to extreiTielv low levels, while female liveli- 
hood remains between $20,000 and $40,000^ 
L'nder Assumplion 3-!^. y,hcn a greater value is as- 
signed to household production for men ^and for 
^ working women, the reversal for white men's and 
women's liveliho()d is postpoiied to the earlv 60's. 
and the livelihood of med^ higher than befoVe in 
bmh relative and absolute terms. The reversal for 
nonwhites is pushed to a lower age. but the differ- 
ciu e at all ages is. narrowed c onslderabK 

IV. The Will^ngne88-to-P^y^K^ea8ure of Value 

luri^lamenlal assum|>ti()n 'of the willingness- 
f()-pa\ procedure is that indi\uiuars preferences 
^hould'c ount — that (iti/c^s can and should pla\ a 
lolc- in polu vm.ikitig for gu\einmental services 
that affect them direclh J heir health, their 
friends, their taxes, their pain and suffermg. and 
their welfare are at stake I'ndersiundabK . the\^ 
have an int^est in the puf)li( adivitiA that mav be 
undertaken Individuals a.re the uhinfaie recipients 
of thx iinp.ut of progr.inis ^ 

Political justifications for uSing individual 
preferences go back at least to the 17ih ceniurv 
and uulu(te the desire foi no taxation without rep- 
resentcUion ' FcononiK arguments for using indi- 
vuiual's preferences dale to the 19th (eniurv and 
iiulude the inilit^irian principles (;f Bentham 
Dupuit.^* a Freiuh engineer. 'argued that the na- 
ture ahd amount of puf)h( transportation fadluies 
slTt>uH<5jylerermme(i bv what the potential users 
v\()ul(T be vMlling lo^ pav for using it Wa^l (ontem- 
porarv e( onoinists w ho studv puf)li( polu \ evalua- 
tion agree that ^n iTppro.K h f)ase(i on individual 
values IS correct in principle 

The ' potential Pareto improvement" stanciard 
which justifies the w illingness-to-pav procedure 
has been rriiici/ed becaiKe it makes the ^^slimaied 
dollar ^)rnefit of .i program depeilfienl on the in- . ' 
come distribution This d^'pendence has been ^, 
critic i/ed either because (I) it is felt that the in- 
come; distribution is ine(|Uilable and hence not a ^ 
just basis of publu program evaluation, or (2) it is 
fell .tfiat whether or not theWic diTut distribution is 
equitijble it is simpK not an appropriate basis for 
i^etermining* the production and c^istribuiion of 
c(;rtain goods fpossibiv including ade(|uate health 
care and safety) which are, like the vote, properly 
considered none ontingeni privileges of •member- 
s^iip in society ^* Ihe problem whuti has not been 
solved by^ critics is to devise an alternative benefit 
"ure^which satisfies such objections The liveii- 

KIC I 



hood measure is even more directly tied tcj^ intome 
distribution (vi7..'by definition) than is the 
willingness-to-pay measure, and ft is not impossible 
that precendeiu political decisions were influenced 
bv the econoinic povver of vJirious interest 'groups. 

The principle practical problems with the 
wiliingness-to-pay.- procedure for benefit estima- 
tion is that developing accurate assessments of in- 
dividuals' willingness-to-pa^r is difficult and expen- 
sive, and the' validity of published attempts to 
apply various^stimation techni(|ue> -is (|uestion3- 
ble Furthermore^ the extent to which estimates of 
a particular population grofip'sNx illingness to pay 
for a particular safe^v-enhancing project can be 
applied to other groups and other types of projects 
is unknown. 

The two principle methods for measuring (he 
v^jllues a houseiiolcl would place on a prospective 
pubhc project are (I) Ihferences c^f'how much the 
hoiheholcl values iiK)rtalitv rediution based on ob- 
servations of the inrplicit value the household 
places on safep'and health in making piivate con- 
sumption ancf job-selec tion clecisioHs. ajici^(2) Sui- 
vev cjuestionnaires which ask household heads to 
st^ite their w illingness-to-pav for the program 
benefit which is under consideration 

A. Implicit Values * 

\\\ can. in pi inc iple.^ Inf ef the values indi- 
viduals attach to morialitv- and inorbiditv- 
recluclion in the same mawner as v\as prop'cised for 
governmental actions (^Secficrti II above). Such a 
revealed preference^pproalh is followed .with 
most market-produced goods tha^hav^* few c'xlcr-, 
nalidb.^^ ^Ve need not go into a detailed siwtacv of 
relative preference's iwr. sa\ . apples and oranges 
People revxal the prefrt^ences thev attach by their 
market behavior This la the method we would like 
to use if we want to ifieasure individUaFs' true 
preferences for the pmgrams It presents the 
strongest claim to validity Ih*c a u^' the people have 
ic) back up their%preferences with^ac tion. and they 
do It m the context of other everydav dec isions for 
spending mcmev,^" These chc)ices n^av include the 
purj^hase of safety devices (for example, seat behs). 
a marginal expencfilure on health items (perhaps a 
doctor's examination and some antibiotics for an 
infec lion), or the premium demanded for accept- 
ing an elevaied risk (for instanc\ higher wag.es for 
extraha/ardcjus employ meni) 

Rec<*ni stAJciies by Thaler.**** I hlTh^ an<LRo;^ 
sen.^** .^miih.^' and L'sher^^ have provided meas- 
ures of impiic ii» willingness to pi^' for* lif^saving.* 
I haler, Thaler and Rc)se*o. ahd Smith examine the 
higher wages paid in occupations with above; 
average risk* of death for evidence about 'the im- 
plnit value of lifesaving. Usher employs a life-cycle 
model of utililv maximi/afion aiui infers the 
trade-cxff between ormsumption* and probability nf- 
survival from a time sc^^ries of the national income 



f9 



74 



ERIC 



accounts and mortality statistics. Both approaches 
have the potential of overcoming some reserva- 

; tions about the syrvey-based willingneis-to-pay 
approach because they examine behavior revealed 
through market activity and therefore have 
stronger claims to validity and stability than exis- 
tent survey .results. 

Since the two Thaler studies and the Smith 
study rest on m^irkef wages, they have some draw- 

* backs in common with the livelihood-saving ap- 
proach. First, the measure requires that the persoif 
be working to determine a'value. Therefore, it is 
difficult to determine the appropriate value for 
housewives, children, retired persons, and others 
who are no^ paid for their work. A Second chticism 
relates to the i^presentativeness^ of this group ob- 
served In riskie^^^o5:«ilpations. Presumably, those 
who are least risk-averse will enter a given occupa- 
tion before those who are more risk-averse, all 
other tHings the same. Consequently, lower risk 

\ premiums will be paid to those who select'the oc- 
cupation that would be nec^ssar^to compensate^ 
rapdomly chosen individual who was subjected to 

- that level of risk, and these measures will be a 
low*er bound on **society*s value.*' Third, the e^tra 

, pay is compensation fpf assuming an above- 
average risk, and^ for that reason may not provide 
an appropriate measftre of value for programs 
which are designed to reduce risk. The compensa- 
tion which a risk-averse^erson would TequiVe to 
accept a A p increase in the probability of his own 
\ death is greater tlfan the amount he would.be will- 
ing to pay for a A p reduction in this proba- 
bility — although the amounts will be close to one 
another for small A p. Fourth, the wage-premium 
observed^will riot necessarily reflect the exter- 
nalities (to family and/or socief^ associated with a 
person-'s death— although the externalitic*s will be 
better captured with thi> measure than with the 
Irvelihood-saving approach if tpe employee in- 
cludes his family in the job-choice decision and re- 
> qutres that* the wage-differential be adequate ^ 
compensate them for his increase^^NFisk as wot. 
Fifth, it is difficult- to identifVwhat portions of dif- 
ferences in compensation are due to the additional 
'risk of death, risJc of injury, and o^h^r Working 
conditions. Sixth, alth^ugH it is uox a general 

- phenomenon, fiere ma]^ he some occupitioris in 
which the participants receive some utility from 
the risk^ and therefore, ^he compensation is in- 
adequate for a normal person. Being a stock car 
racer or being a test pilot may be extreme exam- 
ples, but this consideration may be reflected to 

degree in a number of occupations, some of 
which are included in Thaler's^ calculatiofi^.' Fi- 
nally, at the conceptual level, we do^ot know for 
certain what risks of death ot injury the individual 
assumed were in force when he accepted the wage' 
• 6ffer. Given the difficulty Thaler seems «t6 have 
had in getting good data on death rates by occupa- 
tion, the amount of uncertainty a given individual 



faces about the risk at a particular job sit^ may be 
substantial. • ' " ^ 

v5n the enrpirical side. Thaler found signifi- 
cant variation in implicit validation depending on 
th? data source us^d. With one data file, he infer- 

' red a value of betwten $176,000 and $260^000 per 
expej:ted life saved (for a reduction in probability 
of 0.001 per year), which is reTOarkaJt>ly close to the 
peak human capital value observed for young men 
and to the explicit willingness of pay obtained by 
Acton in his survey for a^reduction of 0.001 in 
heart 0tack death rate. Onr the othfcr hand, the 
value implicit in-tl\^ Bureau of Labpr Statistics in- 
jury data was over $2.6 million per expected life. 
' FurtJfetnore, Thaler*s- regression results with the 
BLS data/ yield an incorrect sign for the coefficient . 
of risk of injury^ The regression with the first dSta 
file did not include a variable for risk of injury, so 
his results are* subject^ to omitted variable bias, and 
the difference between the first and sec6nd esti^ 
mates weVe even more extreme than they appear. 

Usher*3 study is an imaginative use of the 
(Canadfan) national income ^counts to infer a 
tradeoff between consumption oyer a life cyde and 
r^^u rces^devoted to death redaction. He makes, 
utility solely a function of consumption in each 
time period (which is equal in all time periods) as 
well as the probability of surviving,^ and employs 

- strong assumptions about the form of the utility 
function to make his estimates. Given the strong 
assumption ^bout functional fayi, the potentially 
severe aggregation- bias from using such highly 

' aggregated data to infer g utility funi^tion for indi- 
•viduals, and the absence^oP an indicsftion of the ' 

"THel of statistical significance, we may wish to ^ 
place most* emphasis on tlie qiialitativt^ findings. 
Ushers model implies that the \alue per expected 

: life s.aved is greatest at A very y^ung age (it pea^s 
around age 2 for plausible values of his paramef- 
ers) and decreases through inx:reasing age. Its 
value in the a^e sample 2(V-J0 i^ very ^j^^^ 
human capital values reported for wli|f^ales by 
Rice aftd Co^f^jM*/* Since iaJlity is a function solely 
o^ con'^uiMtion'^1(^t earnings) and since he as- 
sumes thaf every one consumes the same amount 
in each year oHife^ there is no difference betwecli 
the v4|Pe assigned to men and wpmen in l^yfs^ 
* model. * . ' • 7 

» 

B.' Explicit Sutements of Individuals j . 

The TJurvey approach permits measi^rement 
of ^he entity which is directly appropriate t6 
evaluating a proposed, public project — the 
' maximum amount each affected household would 
be willing to pay to have the project adopted. In, 
theory this .procedure requ^res no assumptions 
about ir\dividual preferences (e.g., linearity,' indif- 
ference to cause, absence of extmialities) which 
other techniques require. Since- thje expense of 
conducting a special survey for-eVery proposed 
project would pc prohibitive, however, ia^ractiSe 



6o 



r 



we would want to generalize from fhe results of 
one survey in. order to assess^other project^ 
proposals — such generalizatfdns will of course re- 
quire some assumptions on preferences.^^ 

While willingness-to-pay surveys havt been 
^conducted successfully in recreation program 
. evaluation,** th^ only publishecT survey We have 
found of wittrhcness to pav fur health programs is 
Contained in Atton/" and tliat survey deals only 
with programs that reduc^ chances of sudden ac- 
cidental' death or heart attack dearh. I# sought pre- 
liminarv evidence on the feasibility ^of applying 
willin^ness-to-'pav responses to actual program 
• evaluation and addressed several questions; 

• Can questions be formulated that in princi- 
^^^le get -at willingness to pay? 

• Do people, seem willing \o answer and are 
the\ relatively comfWta+)le in answering 
such questions? 

• Are the responses people mike subject to a 
rational mterpretati-on? " 

• What seem to be the major lac-tors inlluenc- 
Higlstated wiHingness to pay"" , ^ 

In totgil. approximately' 125 persons v. ere ^ 
qu'estHHied' about their willingness to pav for heafrt 
attack rtiortdlitv* reduction.'" Peopje.were poSed 
foiir t\ pes of quesiions . / 
I*Age choice questions— Which oi two seij*- 
ouslv injured would voii lil^e to sec *raved in 
' aagimergencv ? Those results were discussed 
abole in the critique of liv elihood-sav in^j^^ 
measures. * . - ^ 

% 2 Cive m the communitv — How much would 
vou be willing- to pav to have a heart attack * 
ambulance that is expected to sav*i? X lives 
per vear of the 10,000 people living around 
vou? * ' 

3 AU\ice wHlingsess to pay— Suppcjse vour 
' ^ neighbor has jufet been. toldTiis risk of heart 
attack is V per v*ear. uncf his'^cihanc^s of 
dving,if»he has. a heart" att%k are Z \io\\ 
" muclV^jJk) vou think^he should be willing to 
pay per'^vear lor a heavt attack program that 
would reduce hh chances c^f dving to Z*? 
4. (Jwn willintgness to pay — Suppose 'val^l^cK- 
tor tells you* your chan<^es of a heart atfack 
are'^'Y per yeaV. and yourj^h^nces yf death, 
given-the heart attack, are Z. How much. are 
you willing to pay per year-for a.heart attack 
program that can reduce your 'chances* of 
dying' tV) Z*? \ \ 

F.acTi respondent answered- 26 questions of type 
. (I), two questions of type (2), and four questions 
each of types (3) ^nd (4), 

The results shewed that we can pose questions 
thaf get at ihe underlying issues o& willingness to 
pay. Furthermore, people vc^ce willing to complete 
the 'inte'rvi^ and seemed relatively comfortable 
and responsive f^doing so (the r*efusal and 
breakoff rates^^ef^ki^ligiblc). The question of 
-^^•'"nal interpretation of thc^ res^onses was not 

ERIC 



dearly resolved in a single survey of this s^e^ Re- 
sponses varied significantly from one individihj^to 
the next (only part of thi's couhtbe explained 
sampling variance due to sample si/e). High vaoTa* 
tion per se is neither unexpected oor undesirable 
for these types of questions. We expect orefer- 
ence&and attitudes to varyV(;om one individual tp 
the next, even, for identical expected benefits of- 
fered to individuals who appear to be simSlar in 

, the socioeconomic and demographic profiles. 
Nevertheless, the responses of most persons could 
be given a rational interpretation, and predicted 
j^ffecns were found for importam explanatory vaV- ' 
lables such income, *wearth, age» and sex. The 
empirical results are discussed in detail in Acton. 
Brieflv, the prmcipal statistically significant . fmd-^ 
in^s were that w illingtiess-to-pay responses m- 

i.t:rease with increasing probability of deSth ahd 
with greater reductions that are offered — but not 
in a^ linear fashion.^* Second, willingiiess-to-pay re- 
sponses are greater the more cc^'ncretely and im-* 
mediately the hypotht^tlc^*l prcjgram is related to 

jthe individual 53. 

If such w illingness-to-pay resf>onses were to'be 
useii routinelv for program' evaltfauon, we would* 
wish to conciuct a survev of a greater number of s 
^ respondents (appropriately selected for statistrcal 
representatneness) where tKe quesnc^ns included 
several different probabilities of mortality, morbid-* 
itv, and several dif ferent reductions in the values of 
each health consequeme ii it. appeared conceptu- 
a»il\ or empirically desirable, separate^ sets of ques- 
tions for major categ^c^ries of diseases or risks 
should be prepared (for instance, heait diseases, 
canc^^r,, accidents, and so forth). If satisfactory, 
statistically significant wilhngness-to-pay relation- 
ships were found, then it would probably be'most 
efJicient to use the results to multivariate regres- 
V<>n equations to estimate the aggregate willing- 
ness to pai^associatWvvith a particular program^ 
'talgng acccSmt of the socioeconomic and demb- 
graphic'charaheristics of the population affected 
and the anticipated chants in probabilities. 

A number of jssues are still left open m the 
feasU>ility of a suifvey-based method for eliciting 
value These include the validity of the responses, 
their stability -and replicability, prcjblems with un- 
derstanding and. grocessi|ig the information in 
t,he«e hypothetical situaticms, and strategic be- 
havior m responding. 

validity of responses to w^lingifess-tO;pay 
cjuestions has not been examined -empirically. In- 
deed, it is not clear thatf the ^^alidity can>ver be 
firmly estabjUhed. -A rigorous test of validKy might* 
be to survey a gccAip of people and thc^^ome back 
and* actually market the goods that had been de- 
scribed (say a' heart attack ambulagcx) or rajse 
their taxes m accordance with responses. Scmie 
people might refuse to act ir] accordance with their 

81 ' - '..'* 



- 76 



ERIC 



previous responses because of ir^^vening factors 
which nvay be difficult to control ror and which the 
^f&fpondent cannot even articulate.*'* 

' The stabilitv and replicability of these pre- 
liminar\ results have not been demonstrated. Fur- 
ther empirical work is clearly needed to see if the 
sanp people res^nd^ \\;ith a reasonable stable 3et 
of preferences whea^resurveyed at a later date. 
Furthermore, we should see if the results can be 
I'eplicated in other'geogra(>hic areas witlvdiffererft 
sticioeconomic and ethnfc samples. \ 

We face several competing objectives in asking 
^questions that are both realistic and yet under- 
standable for the respondents. Since many of the 
situations we pose to people are hypothetical 
(either ^e disease state or the consequences of the 
pnjgrams). -we are uncertain about the individual's 
comprehension of the situation, FoV instance, aj- 
tbough heart disease accounts for alJout of all 
deaths per \ear, the realistic chance a person has 
-«f dving from a heart attack is less than \ per 100 
per vear for the majority of adults. We are. as vet. • 
uncertain about how well people understand and 
process such numbers. 

A . Similarly . we clo not necessarilv kaow how well' 
^Kple understand the nature of certain di^abiliiv 
states %)r recoveries. The operationally relevant 
pf)ini. however Is whether tjiev understand the 
situation well xTioiigh during an int'erview that 
their preferences do not change signific^ntU if a 
decision is made to inaugurate the program. The 
most direct vvav to test this assumption is to 
examine the stabilitv of re»^)nses over tinuv 

A fourth unresolved ipue m willingness-to-pav 
. elicitation is whether people will engage in 
strategic l^ehavior when thev respond Lindahl 
observecf^that when voh trv^ to find out people's 
preferences for puhlit^ programs, 4hev mav have 
an incentive to underrepresent their true . aluation 
if their ta-xes depend^ on their stated, value.. Ac> 
^ ton and Bohm observed that the opposite case 
may also exrst if pwple think the ciecisum whether 
or not to haveHhe program is based on aggregate 
valoe. but the cost-shanrig rule fs determined bv a 
diffeftnt rule. Under these circumstances, if the 
person feels he will be called on to bear ^ small 
proportion of the costs forr a project he want^. He 
should o^errepresent his willingness to pay for it., 
• Dreze and^ Poussin have sht)wn that under some 
' circjumskances. people will have the^cftrrect incen- 
^tives to reveal thei'r true^ preferences for public- 
goods that are already being ^produced. Bofi^ '^"* 
suggests that people be posed questions where the 
payment rule is deliberately specified as vet-to-be 
determined. In thil4nanner. he expects to cancel 
the incentives to over- or underrepresent 'true feel- 
% ings, because people will not be able t^ select a 
strategy fof a misrepresentation qf references that^ 



is guaranteed to mal^e them better off than telling 
the truth. 

Bohm ®^ conducted an exp^rifhent to see how , 
sensitive willingness-to-pay responses were to 
question wording and to an^ze whether strategic 
heha^r seemed present. The sarnple does not* 

<^kurport to be fully representative (only 21 1 of 605 ' 
- randonxly selected resid^nt-s of Stockholm agreed*- * 
to participate), but the experimental design is in- 

*triguing Snd to the point. He paid the volunteers \ 
Kr^O ($10) for a one-hour "'interview" abou^ tele- 
vision programs. When the respondents came to 
the studio, they were told the interview* was. de- 
layed'and they were put in a room with TV screens ' ' 
and given an opportunity to watch a corritdy show 
with two ^^ry popular comedians. They were. given ^ 
the impression that several other respondents 
were in similar rooms ground the building and 
that the -program would be shown only if the 
aggregate willingness to pay exceeded the cost as- 
sociated (Kr.500). The different respondents were 
randomly given different instructtoi^&^jbout what 
the decision rule for actual showing would be.^^ If • 
people vv^re beh^ihg strategically, sonle instruc- 
tions should cause significantly higher responses 

* ihan other instructions. Bohm's empirical results 
show no .statistically' -Significant ^difference^at 5 . 

•percent) in the responses from one question form ^ 
to another. • , 

At t^je moment, \\e,can a^ncliide thaf although 
strategic misrepresentation may exist in princijDle ^ 
in the willingness-to-pay context, it has not been 
demonstrated to be a significant empirical factor. 
At the pragmatic level, it irrelatively unlikely to be 
a sericrus- problem witji preliminary efforts to as- 
r sess people's values, because people are not accus- 
^ tomed to having their tax bill react to such state- 
ments of value. w 

Many of these pfttential problems in irrtple- 
'menting a wijlingness-to-pa^^easure will be 
clarified only with additional empirical, evidence. 
For instance, ^lie estihiates of the true variance of 
responses la society and the mean value of the 
sponses can only.be judged by,<:onductinpls^rveys* • - 
of' repretentative population^iof respondents. 
Similarly, the reproducibility and stability re* * 
sponses over time can be measuredrbu{%ave not 
yet been explored empirically. Scjme of the more 
basic concerns about th^ validity of rhe responses . 
and the i'nferncfl consistency of a giv6n person's re- 
, sponses are more difficult to* resolve. We have 
crude measures of what "hnjernal consistetrcy" 
✓ means, but to denionstrate^rigorously its existence 
(or^pnexistence) hard thinkingp^ needed. An in- 
ter^itrve process of ^oth conceptual development 
aod refined empirical evidence seems to be th^ 
most .viable strategy for furthering our under- 
standing in both areas. Furtherriion?', i^ndone with 
some, fyreplanrving, we can also proMae useful 

■82^ . ■• ' ^ 




interim survey results that can be used as one 
measure of social impact valuation for ttirrent^ 
evaluation efforts. 



V. Conclusion 

Ther^are important cOnoeptual and empx'rxc 
differences betvVeen approaches to evaluation 
viewed here. The choice of method is important 
ar^may change the ranking and value of health 
or safety programs signiTicantly. The selection of a 
"particular method involves rr^deoffs between ease 
of application and conceptual soundrtfess, The 
IKelihood-savin^ approach is easy to apph (and 
has been used frequently in th^ past}, but it has a 
number of drawbacks \Miieii its implications are 
examined ih derail. An approach based on indi- 
vidual prefe*rences (operaiioriallv, what people are 
williO^lo pa%) meets the drawbac(f% of the^iveli; 
hood approj^fh and o^nceptuallv most satislac- 
tor\. Preliminarv eyidlmte suggests <hat it is feasi- 
bFe'to ask for explicit statements ^nd that mean- 
ingful answers result, but a number of protAems 

^ ma\ arise in impleme.ntation on a krge scale. 
There has ^een \er\ little empirical experience 

^ with measuryi^ jmplicu vi^e or^ith conducting 
surve\s of people's wiUingness to pa\ for public 
programs. In the re%ealecl preference approaches 
we x^d\ r^t ()bser\efa represen«t^i<"e grouf) of 
^eopte. and it mav be difficult t(f*know with ccr- 

. taintv that observed beh^\i«ral differences should 
be attributed onK to dWTerences in hr\el (jf nsk. 
CorrespiUidingK", we, do not know what the stabil- 
it\ of sufve\ resptmses is over tim^ nor what the 
sample variance is Jikelv to be Furthermore, the 
vaiiclitv and internal coryiistcnc v of these responses 

• l,s not'vGt established. It is diff ic ulf to. speci^\ 
rigorous tests c)^ the external valfdit\ of these sorts 
of qi^stions. but an interactive (levclopmefU.of the 
coricepKua! underpinnings ajifl empirical evidence 
^provides promis^^of sharpening M)ur understand- 



I'or rrPan)t ac^tikl evaluations*, both th^ 
livel^h^od-saving approach (with' its krIoVvn drav\i 
backs} andean imperfect, crudelv measured^ . 
'Mllingness*t()-pav meihc)dolog\ are clearlvi 
Superior to-fio f ofjtral analysis. Fifs^ir^the anaivsis r^^^ 
frecjOently ari |)rcler*of-magnitu<le .evaluation^ 
Under ttese-Hrcumstanccs, the drawbacks (Vc (|ucs> • 
tions vrfe have abo^^t either approach afc <?cconcl- ' 
order magnitudes and do n'>t affect the conclusion 
whether or not to undertake the program. Sec^^nd, 
emploving both criteria to .see if thcv yield the 
same ct)ntlusior3|can i^nfone one's confidence in 
the robustn<;ss f)f the deciMon. Third, injhe ra^c 
of expected effectivenels for manV realisuc pro- 
grams, the approaches frequently lead tc) rea- 
sonably |:lofe measures and yalijf.'*' 



. , Wh 



vg/i a cljtoice between livelfhcu^d-. 



saviiiig or wiUingness tovpay as a basis forevaluat- ^ 
rng social impact, a stpoag case can-be*made for ^ 
the conceptual superiority of willingness 'to pay. 
The livelihood measure does not bear any neces- . 
sary relationship to what pe^le waril in the way. of 
public programs. If we decide to fiind programs 
bf this criterion, we knovv^'that we. could, irr gen- 
eral, raise adequate revenues by taxing those 
whose livelihood is extended. However, this- 
<:riteriq^ doe$ not* guarantee that^society or any 
individual is made better off by adopting the pro- 
gram. 

An individua^referencti-approach (based on 
willingness tq pa^) does providers with an assyr*' 
ance jtha't society is made Bettej: off in some sense- - 
bvjhe programs that pass the criterion. By approv- 
ing on[y programs such that people are willirlg to cr 
pay,' in th^ aggregate, more than the programs ^ 
c'os^we can make a strong cascj^^t society as a 
whole gams. It is clear that in gerrBfc the.program* 
'will be funded in a manner such th»some peoplt*# ^ 
gam and some lose with, a particulafr {mpler^^taT 
tlon. Nevertheless, sioce ^he ajjgregate wtiling^ss 
to pav exceeds jhe cost, it would be pc^ible to 
spread the costs such that «o one was made worse 
off bv the prograrp^That is, with th^critcnon wc^ * 
^fftentifv potential Pareto -Superior moves 'f or soci- 
-etv. Everv member xan be at least as well off as he " 
•Uas withc^ut the program', and at least one persori- 
is bettef off^ • . . • ° 

Although we starfed this paper with the objec- 
tive of identifving means of placing^ value c5n re- 
ductions in probabilitv of death or disabMity, we ' 
should recogni/e that it mav not be possible (or de- • 
sirakle) to have a unique value that can be used rn 
several different contexts. T nstead, i^ mav turn out 
tJrft preferences are such that we- have one value 
»for a change in probabilitv for cancer death, 
^another value for a change in probabilicy of heart 
attack deathifcnd yet a third value for change, in 
probability of accidental death — tven for. similar 
persons and identical starting risks ancj rcfluction 
in rusks, Given the diversity of values now fmplijrit 
in public de^sionmakirfg, such a finding would rlTTt 
be uncxpecfed. Furthermore^^nalvsts like/^eck- 
hau<er 'argue thpt the process b\ wfiich^ublic 
^ dccisrons are made mav be at least as irppolftarlt^s ^ 
, ^the actual pumerical values used. An. appn^pciate 
^ * stYatcgy for. the clecisu^nmaker charged w;,^,th 
evaJuatiag lifes*aying programs before additional 
meihodolc^ical and empirical research tijkes place 
may be to employ more thanpnt of the techniques c 
discussed. When the different appnThthes yield 
similar conclusions, he can gain confidence from 
, TUc'Tact that hts evaluation does not seem. U\ 
' sensitive to the values employed .When they yield 
sharply ddferery toncly^sions, he-.can probe hfi ^ 
own preferc'^nccs or seek additiotfal evidenc e. about 
the. willingness to pay of th^ m^R^^t populaticyi. ' ' • 



77 




ERIC- 



FootnotM 

* Economist, The ft^nH Corporation, ^^^^^i^^^b^ 

Monica, California. 'I wish to acknowledge willi V»J|^ 
gratitude ^be comments of P.. (iook, W. Manning, • 
B. Mitchell^J. N^whouse, J. Vaupcl, M. Wcinstein, 
and A. Williams. The views are those of the author 
and do not necessarily reflect those of the Rand 
^(Corporation op any of 4t^ corporate sponsors. 

1. Formal prospective evaluation of goverhmenia] 

; "i^l^rograms; as discussed here, is a rel^jpvcly young 
discipline. Water resource allocation has the long- 
^est history in thq U.S., havhig been,4librged since 
'the l930*s to deterTnine "if the benefits to whom- 
soever they accrue are in. excess* oOhe costs.'* 
(From FJood Control Act o^^^^^UF^ 
Prest and R. Tiirve^, "Ccfet DCTiefiMAnalysis: A 
Sur>«ey;* in SifRVEYS QF ECONOMIC THEORY, 
St. Martin's, New Vork^. 1^0 (1966)^ Most of 
|hese ap'pUcations in wCftr resources have been 
limited lb economic benefits an<K costs, although 
considerations such as recreational values and 
their distribution have been added; see, for exam- 
ple, B. Weisbrod, '^Income Redistribution Effects 
in Benefit Cost Analysis,"* in Stuart Cha$e (ed.), 
.PROBLEMS IN PUBLIC EXPEJsIJQITURE 
ANALYSIS, The Brookingsjnstitjjtion. Washing- 
• ton.^.C, 177-209^1968). . " ^\ 

A number of wont^ists have re Vie wefk vari- 
ous aspects of the evaluation literature. Presto and 

^ Turvey^i^^ a good* background review of 

the cost-beneifit literature. P. Steiner (PUBLIC 

. 'EXPENDITURE BUDGETING, The ^rookin^s 
Institution, Washington, D.C. (1969) ) focuscs^jn a 
number of issues in prOgrarp budgetirfg for fed- 
eral programs. H. Klarman*^eviews literature re- 
lated to health evaluation, focusing on tbe evalua- 
tion of health technology In "Application of Cost- 
BeneTitsAnalysis to H-eahh Systems Technology,'* 
in IS^orris 'Gotten (ed.), TECHNOLOGY AND 
I+EALTH CARE SYSTEMS IN THE 1980 s, 
DHEW Publication No HRA 74-301 1, 
Washington,. D.Cr (1973), NTIS PB No. 2*20 
613 266p. Iji.ft, Thaler ("The Valde.. 
of Saving a Life: A Mark/f Estimate/' Ph.D. 
dissertation. Department, -of Economics, University 
of Rochester, New. York (1974) ) reviews spme his; 
torical attempts ax^ aluation oUifesavirig, arid R. 
Zeckhauser (''B^JJ^drrfes fo^CValutng Lives," 
PUBLIC POLICY, Vol. 23, NoF4, 420-463 (Fall 
*197.5) ) providesjgtf^ussion of Ipme recent appli- 
cation^. There are^everal essayaj9n public expen-^ 

. dilure in genecal. Dorfman andjChasc have edited 
ocusing, on particular f^ft?hlems of public 
expenditure evaluation; see R. Dorfman, 
MEASURING THE BENEFITS OF GO^RN- 
MENT INVESTMENTS, The Brookings ^titu- 
tion, Washington, D.C. 0965). and S.B. Chase, 
PROBLEMS IN PIJBLK^: EXPENDITURE 
ANALY&IS, The Brookings Institution, Washing- 
ton, D.C. (1968). R.H. Haverfia^i and J. Margolis 



have edited a (sometimes revised) set of essays on 
Planning, Programming, Budgeting System 
^pcrience by a number of practitioners 
ics, tilled ^BLie EXP^lNblTU^lEy^ND 
JC ANALYSIS, Markhani^ Chicago (1970K 
f th^^host extensive ancBaccessful applica-^ 
F formal analysis have been in the defense 
ar^a^ Altlfiough they have tended to be cost- 
eVective* rather than cost-1>enerit analysis (i«e., 
How can we best achi^e ji defense or tactical*or 
Strategic posture without^sking how expensive a 
posture we should have?), some techniqu^ps de- 
v^ped tJfere from tile basis of analysis, especially 
regarding the generaJ^^fructarin^ oi decisionmak- 
ing under uncertainty and tli^ quantification of 
uncertain outcomes. A good introduction to this 
systemati^^approach to analysts^ wi^h description 
of ^variety of .techniques, is found *in a collectiqn 
of essayrf eidted by E.S. Quade and WJ. BoMcher^ 
SYSTEMS ANALYSIS IN POLICX^ANNING, 
American EJstvier, New York*(1968). 

2.. See^ in general E.J. M^i^^an,^ **Evaluatioh of Life 
and Limb: A Theoretical Approach,*' JOURNAL 
OF POLITICAL ECONOMY, VoJ. 79. No. 4^ 
687-705 (1971). lAn interesting discussion of 
whose interests should be reflected in benefit valu^ 
ation which considers the int'efgeYierational prob- 
lems is to be found in J^A. Dowie, ifraluing the 
Benefits of Health Improvement," AUSTRALIAN 
ECONOMIC PAPERS,,Vol. 9. Nq. l^, 93ff (1970). 

This criterion was (HBginalJy proposed by botK^ 
N.* kaklor, ;*WeVare Propositions of" "Economics 
and Int^pirfonal Comparisons-^f Itolity," ECO- 
NOMIC JOURNAL, Vol. 49 (1939); and J'.R! 
'Hicks, "Yhe FoundatiAis of Welfare Economics," 
ECONOMIC JOURNAL*, Vol. 49 (1939). A good 
recent di^ussion in the "valuing Ijves" contexts J. 
Hrrshlerfer^ "The Economic Approach to Risk- 
Benefit Analysis," in David Okrtnt (ed.) RISK^ 
BENEFIT METHODOLOGY AND APpLICA- 
TIONS. (processed), UCLA-ENGa;598 (December 

1975). > ^ ' ; • 

4. A term due to Schelling (T. Schelling, "The Life 
You Save May Be* Your Own," ih S. Chase, ed., 
PROBLEMS JN PUBLIC EXPENDITURE 
ANALYSIS, The Brookings Institution, Wawhing- 

.ton, D.C>jU27-r76 (1968) h-as distinct from the 
lifesaving\)r\villirigness-to-pay, approach. * 

5. See Mishan, note 2 iti^a. p 

G. Calabresi, tHE COST OF ACCIDENTS: A 
LEGAL AND ECONOMIC AnX^-YSI^, Yale 
Univ. Press, New' Haven (1975). * / V 

7. R. Poiner,. (ECONOMIC ANALYSIS OF THE 
kittle Brown, and, Company, Bosten (1972) X 
cKeanlfTroducts Liability: Implications t)f 
:hangijrg Property Righis," QUARTERLY 
JAL OF ECONOMICS. Vol. LXXXIV, N^.' 
11^-626 (Nov. 1970) ) have exploded conditions 
under* which economic efficiency is Tmproved by 




assigning liability to one par^ (saN the producer gf 
a good) rather than permittiilg thj? market to sup- 
* piv (or fail to suppK) products that prov^ide reduc- 
tions in risk. Ajthough, in general, these Ifabilit) 
sulution^g^pos«d to im^jrovc economic efficiencN - 
will understate the value oFJifcsaving or disabilitN 
savm^^ that would be inferred from a direct as- 
sessment (n willingness to pav , they cannot be used 
as an unambiguous lt)v\er bound becaust^ of trans- ^ 
. action^ costf aiKi lack ol' 'perfect mformation, pos- 
« sikle differences between the group cleternnning 
t^e lal^imd tlu^* engaged^ Kh the transaction',^ 
punitive elements to settlements, or differences be- 
4N^een thj^^roup affected ex ante arid t-he group 
being compensated ex post. . ^ * 

8. See R Fisnex and R Sirotz, JJ'hght Insurance 
and file Theory of Choice. " J()UR,NAL OF 
, P(5l.rn(:AL KCONOMY. Vol 69. No. 4, 35&-36« 
(August I9HI) 

9 J K (.ohen (* LiveUfiood Benefits of Small Im- 
provements m the Life Idble." HFALIH SKR\'- 
K^.S RFSF ARCH. 82-90. (Spring 1973) ) rennn|ls 
us that It is ( r irc lal to make ( lear tfu- time course of 
the benefit foi epulemiotogu .il as well as \aUia- 
tKKialiireasons Frecjuenilv. analvMs have in rnind a 
program that offers a reduction in possibilitv of 
death tbat is effective for one v(*ar at a time 
(.wfien points out that some program benefits nrav 
be mcwe-acc uratelv ( harac It^-ti/cu:! * in ^a diiiert:-nt^ 
maniic^, and th«ii the alternative definition rnav 
make a* large differeruJ^i the measured beru-fit 

**THe defines. a "(urative ' benefit as one that (Hfers a 
. person a onc^^tirne save (or redudiou m probability 
. of death) from a (iisease, regardle.ss of the age at 
whi^^h It (Hcuis. and then the f)erson falls Iwc k into 
' the norinal risk pool. He defines a "preveTTTive ' 
^beilfii as one that elmiinaies a [)arti(«iilap cause of 
deslth'entirelv '( oben , sfiow s that sulfctantlal dif- 
^ fere rues can arise in the rveasurc^iMotal- l^tnef it 
when a ( urative or preventive f)enef it rai}it*T tharl a 
one-vear explosure behcilit is mvo)^ecl In the case 
of kidnev disease f/>r L' S males, fws (alculations || 
vield a total benef^ about 22 times as large as ihal 
of J Hallan. et al . I HF t( >()N()\lI(. ( OSl OF 
KIDNEV DISFA^il- AND RFLA f FD DISF \SI- S 
in IHF URINARY SVSIFM PHS Pub No ^ 
1940,1 S (; P 4) . W ashington. D C (1968) 

10 Ii 'shf^uld noted' that while the ;*value oi life" 
terminology is ('onvement 'and frequentiv en(oun- 
tered wnhm fhe pfnlosophical Iramewrirk of the * 
l|veliho()d p(rocedure. i! is strictiv aaurat;:* onlv fx*- ' 
cause of th^ linc;ariiv assumption' If de(ision mak- 

' ers are tion-linear wah respect to liveIih(K)d saving 
(eg . if the\ are not indifferent between, (a) safing 
one pej^on's life (and hvclih(K)d] with rertainfy and 
(b) sav;ing oive hundredtl^ each of lOO^persoits' 
livelihood)^ then one cannot Jf'en speak of the 
"value of a life" wifbin the context of the liveli- 
hood measure WiiHin tl^ context of w^ilmgness- 
measures, it is meaningless to speak of "the 

ERLC . - ^ 



value of a Jife." In general^ one can only refer to 
tjie expected value per life saved dt a given initial 
k of death and /or a gfven re^cHon m nsk. Suppose 
a given individual has an initial risk ot death P, 
\ and IS offered a chance to reduce it bv A P. If he 
will be willing to pTav an amount. X, to reduce the 
rist, theij/<fe may refer to the value V (vNhich 
equals X7 A P) as the expected value per life saved 
for this setM)f circumsianves. (It can also be viewed 
.as the amount that a large lUuntH-r of p^opje simi- 
larlv affected and with similar tastes would pav. on 
the average, for c;^ch life saved n1 their group ) In 
general (because of risk aversion and because one's 
budget constraint is affected bv non-tnvial c haig^S 
in i#k of death)^ people will not be- willing to })av 
an amount 2X for a reductio.n in nsk'of 2 IlV. 
Sirmlarlv. people's who^* imtial risk is Q insteadM)f 
fi. will generallv'be willing to pav something other 
^tfi.in X for me same* A P We discuss some evidence 
about amounts pc»of)le ar^> willing to pa»v for dif- 
ii'%v\n v'cWues of P and A 1^ in Section*IV: ^ 

,11 J (iarlson, A'alu.ition of Lrfe Saving." pfi.I) 
Dissertation. Harvard^rmverMtv (196!^) 

12 See. for instance, l- (.raminond. *',I he (lost of 
the War. ' JOI RNAF OF I rtF ROVAL SI A I IS 
\lI(.AL SOCJl- lY.^Series A. Vol J8. 
(\Iav or H Boag.' ;;|Juman (.apital and tlie 

( ost of the Waj.V f()LRNAL OF I HF ROVAI, 
.SFA ti*v4 A l^S<K:iF;T¥T Series A. VoT^Vl). 
7-17. ( Januarv 191b) For a^revievN (Jf some eel- 
evant liters u re. see F Du[)lin and A. L()tka> I HF 
MONFV VA^ F OF MAN. 1st and 2nd itch he 
Ronald Press fo . New York (1931 and 194f)j oi I) 
Ru e. "Fstirnating the (^ost of Illness." AMFRKiAN. 
RNAL^F PL Bl.IC. HFALI H. Vol* 57. No 3. 
-440 (I9fi7) \fore recentlv. the livc^J^hood- 
. saCmg a[)proa(h has [)een used in a iiuni[)er oi 
governmental evaluation studies' See, for example, 
^ I S Departme-nt of Health. Fducafion and Wel- 
fare . I) f S I- A S I- (.ONfROL P}^()(,RA>!S 
sffh:jh) DISFASF^C.ON I ROF PR0(,RAMS 
( I'^^bBa) and. HTM AN LN VFS I .M F N I PRO- 
(.R.AMxS' SfIfc.IFD human INVFSfMT-NI 
PR()(;RAMS (MJbf)!)), B F Kijter ( * I he Historual 
Roots of Hun>an (.apital. ' JPF-. \'ol 74. No n. 
481-499 (196b) ) and F Ihurow (INVFSIMFN I 
yKS HI MAN CAPII.AF. Belmont. CaHforma 
({970) ) \\d\v r^-views of itii general applu arior^i to 
otfier areas of analv'sis I) Rue jnd B (,ooper 
( *'I he F(onomi( Value of Human Life." AMFKI- 
CAN jt)i;RNAI. (^F^PL BFK. HFAFIH, Vol. 57; 
No 1 1. I9r)4-1966 (l<)i>7) ) have one most exten- 
sivelv applied set of livelihorxl tables 

IH Ihal IS. if the e.irnmgs m vear 1 are Fj. tlie 
probaF)ilitv of surviv^ig until vear is Pt. and *be 
discount (or interest) rate is r. then (lie hvc-Iihood 
of a person n vears fjd is 



\'2T- 



in (Ur) 



1 h 



15. B. ConJey ("The. Valug of Huma^ Life in the " *^0. B. Weisbrod, "The Valuation of Human^Capi- 



Demand for Safety." AMERICAN * ECONOMIC 
REVIEW/Vdl. 66. No. 1 (45-55) )rha§ recently ar- 
gqed that* changes in expected present valufei.of 
earnings provides a lower bound to individual will- 
ingness to pay for' lifesaving programs. This con*/ 
elusion pcqufres a niimbei; of strong a$sumptL»ns, 
liQwever, on the" nature ofMndividiiaj pteferekces 
,»and on a lack of interest by and for*^>th€rs inXan^, 
mdividual's lifesaving valuatioq. Further, Coijley 
recognizes that there is a i*ange of income'^^ver 
' >\'hlch ni| cpnclusionis do not a,pply. He assumes 
^tKat this is at a very low level ofincotne. but thete 
is.no evidence to support or to jefute t^is assump- 
tion.? Cook ( "The Earnings Approach^^^o^ Life 
VaUiation: Reply to Conley." Draft Paper ^976) )* 
"^su digests, some illustrative j/alues for the paramet- 
er ^t>fcjC«iTtev's model which make it plausible that 
this will not be aiWowver bound for a large class of 
Vijidividuals . . • ' 

16 Rice afid Cooper, note 1 1 supra, and B. Cooper 
an^^W. Brody (" 1972 Lifetime; Earnings by Age, 
Sex/Race. 3nd Educational Level." RESEARCH 

•AN'D statistics NOT^:. DHEW* (September 
30, 1975) ) have a w ideU used set of sucfc tables. 

17 'fhe logical extension of the Mewpoint which 
^seenrs tg motivat^ the livelihood prec^dure is to 

argue that kn individual's c^sumptioA should be 
'deducted from his eafflings in calculating the 
value of ^iis hfe-r-ihat his- v^ut is equal to nhe 
. present value of the surplus he generates (note 
again the^nalog\ with the slave) Orie implicatioji 
of thi^ "net livelihood" prftceou.re iythat society is 
made better off b\ .the death oP tl^o'Se whose ex- . 
pecfed net present value is negati\e^which is tcu^ 
^ -of retired people and those who iiV^ near re^^re- 
, ment.. some of these receiving diS3bilit\ ^nd public 
a%itstance. pav ments. soSle children. ^nd so o/i. ^ 
Dissatisfaction with the implied j.udgn\eni <hai so- 
ciety should not expend- an\ effort to extend the 
, * lives ?)f such, people has led researchers to us^ in- 
come without excluding consumption ^ See, anfong 
others. R Fcin.'THP FCO\aMICS (Sf^IENTAI. ' 
' ILLNESS. Basic Books. New .York (1958): Klar- 
- man, note \,\upfa, and M. Feldstein, C(!^§37 
. BENEFIT ANALYSIS AND HEALTI^ PJ^AN- 
' NINO fN DE\'E1/)PING t:bUNTRIES. Discus. 
. sion Paper, Harvard Uniyersit\ (1970). 



ERLC > . 



18. ^ J. p. Acton, EVALUATING PUBl!lC PRfO- 
GRAMS TO SAVE tlVES: THE CASE OF 
HEART ATTACKS, Th^e Rand Corporation , 
R'_95f>^RC (1973). * . 

19. Thirty-six. of the^ respondents Vere j/giected , 

* a^ random from three communities iri BostoA (half 
^nen and half women): I ^ were men in 4' trade 
union program, and 36 *verc,in an advanced man- ' 
agenient program a>*the44arvard Business School. 
S»e Acton'(note 18 $u/w,^p. 83-^5)J5ir a descrip- * 
tioh pf rtiesf samples. , 

• I • . » ' * 



tal/^^JPE. Vol. 69, No,^425-436^1961). 

21. H. Klarman, "Syphilis Control Programs," in- 
' Robert Dorfman, MEASURING THE BENEFITS 
OF GOVERNMENT ^INVESTMENTS, The 
• Brookings Institution, Washington, D.C., 367-410 
(1965). ^ > I 

22vRice,note 11 supra. 

23. M. Feldstein, nott* Ij supra. 

24. For insrtanccS^^ould elcamine the earnings of 
wortien with similar education and training ^o 
are employed full time in the market and im^^te - 
those earnings, to the women who stay home. See 
Posntr* note 6 suprc^, 'pp, 79-80 for this opportu- 
nity cost argument. ^ * 

, 25. J. Morgan, h Siragelditi, and N, Ba^rwaldt, 
PRODU(!rrWE AMERICANS: A SURVEY OF 
HOW INDIVIDUALS CONTRIBUTE TO E(;0- - 
NOMIG PROG'RESS, University of Michigan, Sur- 
vey Research Monograph 43, Ann Arbor (1966). 

2,6 K.E. Walker. W.-H. (iauger. "The Dollar Value 
of Household Work," Cornell University, New- 
York State College of Human Ecology, Informa- 
tion Bulletin No.*60. Ithaca (June 1973). 

27. Rice' and Cooper ^note II 5^/^-0) -assumed thai 
all nonemploved wpmen contributed a Cull share 
to home production and assigned the.T^uII-tirpe 

. aarjiings of a domestic wor((:ffr t() those women, 
^about $2767 per \car in 1964. They assigned no^ 
other value for hou^hold production to others. 
This implies, among other things, that it i^re^, 
quently better to save wpmen who do not work 
than It IS to save women who woTk part-time. In 
Cooper and Brod\ (note 6, supra) the value of 
housework measured by Wallter and Gauger (note 
26 supjfa) was used, but no adjustment is made^ for 
men or for changed producnVity after* age 6^. ^ 

28. Rice and Coopef. note I \ jupra. - . ' , 

29. Walker and Gauger. note 26 suprd.' 

30. Morgart et al.. note 25 supra. » 

'•3l. jy Acton. MEASURING THE 50CIAL IM- 
PACT OF HEART AND cifRCUbATORY 
DISEASE PROGfeAMS: PXELIMINARY 
FRAMEWORK AND ESTIMATtS," The^Rand - 
CorjWl^Uun,ft^l697-NHLItl975). * ' . . 

^32. Id., S|k:- IV ^ - ^ , 

33; Afte( this woric w^s Qompletc^L; Dprdthy Rice 
(per&onal communication) informed me' that the 
domestic w^ftjrr s earnings for 1^72. were about 
$4000^ Resources did not permit recalculation of 
all the hunian capital tabids to adjust for thjaj^facl, ' 
but we should ,note th§t^it d6es not cliangc the • 
character of fhe mo^hoaological and empirical* 
findings. If recalculate, the differential behveen 
men andvwomen would Increase during ihe work-*i^ 
•ing years and narrow, somewhat over, 65 years of 



age. Th"?. average afeounl^f willinghess.-to-pay' 
measure would increa^ fumier over the human 
capital amoi^ni. * * 

34.^ Dupuitr^Oii thf Measurement of the Utility 
of Public Works/i (1844) jp-^n slat iorj reprinted in,^ 
READINGS W*ldR<RE ECONOMICS, K.' 
Arr,ow and T ScitoA skj , eds.*. 'R.D. Irwin. 
Homewood; Illi'niois (1969)/ ^ 

^86. See, for exaqipl#. P. A. SanSpelson, 'The Pure * 
Theorv of Public Expenditure, " REVIEW OF * 

* EtONOMICS AND STAtlSTICS, Vol. 36, No. 4, 
387-389 (1954) and* "pfagrammatic Exposition of^ 
the Pure Theory of Public fKpenditure,'* I^tviEVV 
OF ECON©mc:S AND ST.^TJSTICS, Vol 37. 
No>4, 3o(^^M:>\\^o): p. Bohm. "An Approach to 
the PrpbJenrt of Esiim^ng the Demand for Public 
Goods," SWEDISH JOURVAL OF ECONO.MICS. 
Vol. 73, No. 11.^55-66 (f^)?!)^ M.S Feldstein. 
Vf A, Pio^, and T K. Silndares\n. RESOL'llCli|' 
ALLOC.ATION .MODEL F()R PUBLIC HEALTfP 
PLANNING, A CASE STLDY OF fUBEfe-^ 
^CXLOSiS CONTROL. VorU Heahh Organ.iza-' 
tion, Geneva (1973): I.E. Lave and W.E Webef. 
"A Benefu^Ciosi .Analvsis of Auto Saf/^i^ Feaiurei>;" 
APPLl*f) ECONOMICS, Vol. 2. No 4. 265-275 • 
'(1970): EJ Vlishan, n(>ie*2 supra, and Zeckhauser, 
note 1 supra . I ^ , 

36. See J Vobin, 'On l!imiiing the Doinam of In- 
etjualitv," JOLRN.AL OF- LAW AND EjClONC^M- 
ICS, V^>L^'l3, (October 1970); A M Okun. ^ 
EQUALITY. AND'EFFlClt^CY THE BIG 
TRADEOFF, The Brookings Institution, Washing- 
ton, DC' (1975)/ 




37. Thar IS, effetts that exien<!aMB|pid the pnuotk 
pal economic agent A good^^maxi^le (^f exter- 
nalities IS the pollution that m^', be generated in ' 
the prmluction of s^^me goods. Neither the ■man- 
ufacilW^r nor the consumer of the good pav for 
-•the smokt (a^ least until recenilv ), although a 
number *of p^eople experience , the effects, would 
like to see them reduced, and ^^ould'be willing to. 
pav to h^ve them reduce , • 

38yDreze, in parircutar has argued the merits of 
using ihi^ procedure. See J Dr^r/e, ' I^tilite S^Kial 
d unc Vie Humaine," REVUE FRA-NCAISE DE 
R.ECHERG:HE OPERATTONELLE, Vol 23,^93tf 

n?62). ^ \ ■ ^ 

39'Thajer, note,l supra, 

40. R Thaler and S Roj^n, "'The^V'alue of Saving ' 
a Life;' Evidence from the Labor .Market," paper 
presenteid- at th^ NBER Conference on. Income 
and Wealthy Washmgion, I>.C. (November, 1973). 

41. R.S. SmlKi, "CoVnpensating Wage Differentials' 
3pd Hazaraous Work," study for U.S. Deparim«fnt 
of Labor^{ August 1973). / 

42. D. Usher, ''*An ImpulaflP^ ici tht Measure of^ 
Economic Growth for Changes in Life Expec- 

ERIC 1 % . 



tanoy,*' in Milton - Moseyed., THE*MEASl 
MENT.OF ECO>JOMIC AND SOCIAl 
FORMANCT:, NBER, New York 195^5 

43. \ct6h. note 18 supra. 

44. That' is, risk of injury is probably positively 
correlated with risk. of deaths Omission of the first 
variable will bi^ the coefficient otUhe second vari- 
abl^away ffom zero, fusing his estimates vvil)i the 
fifst df^ta file to be too high. 

w^ice and Cotoper, note 1 lit-supra. 

46. Advocates of this approach include T. Schel- 
Ung, note 12 supra; V.D. Taylor, HOW MUCH IS 
popD HEALTH WORTH?,' The Rand Corpora- 
.tion,'P-3945 (1969); and J. Actqp, note 18 supra, 

47. Recently, a jiumber of resiearchers have con- 
sidered the nature of the Utility function that may 
underlie an individual's willingness to »pay for 
hSesaving. 'H. Raiffa (PREFERENCES' FOR- MUL- 

' TIATTRIBUTED ALTERNATIVES. The Rand 

^jCorporaiK^n (I969)rhas showiv-hnder very general 
assumpnons xhai a self-inieresied person^, living 
alone (wirh^o heir and a prepaidifuneral), should 
pa\ more ffcr a given redjuciion m probability of 
death if he is at'a greaie^^ overaii risk of death. J 
Plfsk^n^ M. Weinsiem, and R. Shepard *(UT1LITV 
FUNCTIONS FOR LIFE YEARS AgfeD' HEALTH 
ST.ATUSJ Harvard School of Public Health, (Oc- 

' lober 1'9*77) ) and M Wemsiein,*R, Shepard, andj 
Phskin IDECISION- 1 HEORETtC APPROACHES 
TO^V^LUI^NG A YEAR OF LIFE, Harvard 
Schoollof Public Health (January 1975) ) (f)nsider 
the val>iing of lifeAears as a ^i^roblem in i^lti- 
atiribi^ed utility theory, where the joint or condi- 
iionaf nature of the "gwd" being V)f^^i makes a 
difference to the inferred v^lue.^ P. and D 

Graham ("The Remand for InsuVaric^ and Pfoiec- 
Hon: The Case^ of Irreplaceable Cd"mmodiiies," 
Prafi paper (1975) ^ explore th"? relationship be- 

i twe^n willingness lo^ paij. to avdfff a loss and the 
( ompen^tion, req,uired to make a person as Veil 
off after a loss. \i. Jones-Lee ("Valuation pf Re- 
duction tn Probability Death by Road Accident/' 
JOUJ^NAL OP.TRANS^RTATION'* ECONOM- 
ICS AND POLICY, Vol 3, No. 1, 37-47 (J9^)) 
provides m analysis of. the compensating variation 
required for various changes irt the probability of 
death or injury Usher (note 42 supra) and Conley 
<note 13 supra) formulate the. issue as a Jife-cycle 
model in which the individual is assumed to tr^^ to* 
maxinfjize his fxpected Tifeiime utHiiy, which .de- 
pends directly 6n his consumption fa eac^ time 

period. Actual application is rare, however, as 
most yvriiers have slopped with a theoretical" 
treatment or hayV choserl^n admiliedlv infenof 
technique for.actual measiJrern^ni . 

'■ * 1 * 

V 48. J'L. ^net»ch and R E|avis, "Corripafison^ of 

(1966) in R. 
ONOMIC:* OF 



!yfethod5«^or Recreation Evalu4U(^n," 

^DoVfman and N. Dorfman, E(X)N 
w ♦ 

87 



THE ENVIRONMENT. W.W. Nort'off, New York 
. . . ', 

49. Acton,, note IS, supra. Related work includes 
the si^rvey of wHlingness to pSy for selected disease 
entities condujrted by 5^1. Palmatier, "Willingness to 
Pay for Healrh SerytteST A Sampling of Consumer 
PV'eft?rences," ^Unpublished paper, Department of 
Econctaics,. yniversity of Sdtjtheun California 
(January iS, 1969); a prototype ?uWey for deter- 
mining individual' tradeoffs among attributes of j 
disease, reductroh pxogrartis was- developed by E. 
Keeler, MODELS OF DISEASE COSTS AND 
THEIR USE IN MEDICAL RESfeARCH RE- 
SOURCE aILOCATIONS, The Rand^ Corpora- ^ 
•lion, Pfl537 (1970). R.L. Berg C'Estabfishing the 
^lues-of Various tonditi(>ns of Life Fof a Health 5^ 

Status fndeii:, * in R.L. Beyjg, ed., HEALTH 
. STATUS LNDEXES, HospiiafResearch and .Edui 
-^cational Tryst, Chicago H973) ) and G.W. Torr- 
ance, D.L. Sackett, arid Thomas {"Utility 
Maximization .Model for Program Evaluation: A 
Demonstialion Af^plication," liuf.) h^ve some im- ^ 
pjjled values for njedic^l risk-iaking%ased on the 
Responses uf phvsicians in their role as proxy de-, 
cisionmakqr for patients.' 

50. Part of the sample was a, representative com^ • 
i;nfunity sample in the Boston area, and part uas a 
sf^mple of voung ajid middle-aged^tiM^ in a Busi- 
ness school program. A varie^v of^ questionnaire 
forms uere used as it i^flbt possible to repjojt cm-f 
pif leaf results for {heirril sample of identical qii^T- ' 
tions. The queslionn^e for these surveys is con- 
tained irr Acton (npte nl siipra, Appendix). 

' 5l4|^clon, nofe 18 supra, esp. pp. 92-105 

52 This finding is further evidence that individual ' 
preferences do not follow the irTy)lications of a 

1 1 velih'bod- saving measure. ulSich is strictly ptopor- ^ 
lional to income We can infer bolh;rjsk aversion 
and an upper-limy^of willingnes's to pay for a 
given mechanism of death reduction from these 
data. • . ? , • 

53 That is. the responses to question types (2) 
were geheralVy less rhan the res^n>es to ty'pes (3), 
uhich were generally less than responses to types 

(4).'- ,, ■ . ' . . ' ; • ', 

'54. Fpr instance, after thinking over what h alight ' 
be like to be confined to a bed for a long^riod of>» 
time, his willingness to pay to av<3id auch disability 
might change. * ' , ^ , 

55. E. Lindahl, "Some Controversial QuestioYi^ in 
the Theory of Taxaiiqn," (1928), translated by E. 
Henderson/ reprinted in R. MUsgrave and )V-. 
Peatoct, eds.. CLASSICS IN THE THEORY OF 
PUBLIC FlNANQi^ 214^232 (495^ ^ ^ 

56. Acton, /loteslS iu^a. * 

51 . Bolm, note 35 supra, 

38. J.H. Dreze apjdb 9. de l^y.' Poussin, "A Taion- 



^enrenl Proc 
ECONOMIC 
1971). . 



«&Jor Pubhc Gdods,*^ REVIEW OF 
HfUDIEs' No. 38, 133-150 JlXpril 



59. Bolm,* note 35 supra^ 

60. P.^blm, **Esti.ipatlng Demand for^ublic 
Goods: An- E3cperiment,'' Reproduced, Depar^k 
ment of EconoiraBST University of Stockholm (no 
date). • 

61. For instance, "you pay your actual maj^imuni 
willingness to pay," oi^ you pay some fracti6fi, or 
you pay ^ proportion yet-to-be-df term ined, and so 
forth. 

62. Other me^ns besides a wfllingness-to-pay sur- 
vey cam be used to elicit the explicit values of indi- 
viduals, biit none of them answer^ the operation^al 
question of evaluation; How much should be spent 
on programs ihat'change people's chances 4y( 
death or disability? The except%Q to this assertion 
is a scaling technique that emplo^^ von 
NeumaQn-Morgenstern loltel'ies/ to* determine a 
utility functioA. C.R Neii (Remonstrates that this is 
formally equivalent to a willtngness^io-pay ap- 
proach (**The Use of Inilividual Preferences in the 
Public Valuation of Life and Health^* unpublished 
Ph D Dissertaiiort, Department pf Economics, 
Harvard University (1975)'). The remaining tech- 
niques cannotjjrovide the operaiiorritty* needed 

^answer. Jor instance, a variety of psychometric 
scaling devices could, be employe'd to measure 
people's attitudes toward attributed of pfogram 
impact (say, de,ath or disability), or their attitudes 
toward programs (say, hea^aitack ambulance or 
anii-hypertension programs). The results of such a 
scaling, however, do not answer the fundamental 
question of el^feUtat4on• Shpald scarce resources 6e 
committed? Suppose I * kfTbw r-^al Program. A 
score(y8 ^nd Program B scores 4 on a 10-point 
scale where 0i#s very bad and 10 is very g6od. We, 
do rtol know wh^°he? .or not to undertake either 
program Su.ppos«t'>ve jnclucfe' information about 
program cost and define the status qu^f as 5 «n the 
s^ale, we woulcf still not know iT eit^f r^pr6grara 
should be undertaken.' Funhermore, *elPEi ff such 
a scaling-prcxkiced' an indication thai a program 
shotjld oV :should not be underiake|^|he /results 
^e of limited applicability because weVn#w only 
the valuation of a few programs rather than hav- 
ing a procedure that can be generalized. Aiiother 
approach would ^ to ask people if t^y w/>ura like 
to see more,*^less, or the same amount sper>t ((rr»a, 
gjyen public, program mI/ we then asked how much 
more shoiHd be spent, and specified the [>ersor^^ 
share, of the cojt, we wouJd have a Te5\^ equiv:^ 
lent to willingness-to-pay iQI'lts and, would answer 
if>e q/uestion of evaluation. Furthermore, if we ask 
enough questions, this iterat'iori wjJWproduce a 
majo^^njrte ^fti>aa4^,' which has significant ap- 
peal iKi public deusiortjpiaking.^riteriofi. 



63^ For instance, in Acton (note l&^supra) {}\€ con- 
clusions'jis to* net benefit of fLve'ihterventions f6r 
out-of'hospital heart attacte wer^ very similai^ 
Under both methods pf evaluation. 

'64. That is, if^we were to tax away an amoui^ up to " . . 
the entire future earnings of in^dividuals whose ^ 
lives were sav^» then ^e would cover jthe costs of 
^uch program§.' In the-^absence of indentured ser- 
vftude. we may nofalways realize even this siiua- 
tion, ^ - ' \ ^ 

65. Zeckhauser, note 1 supra, ^ 



Y 




Economic Analysis and the 
Evaluation of MM>cal^Prpgrams 

Jail Paul Aciop! PhD*^i- * • . . 
The Rand*Corpo#&ti&h'* ^ 
Sania Monies,' California 904^ 



% 



I. Intrdiduction , t * • - 

E(X)fi(Jfrrics is the science of scarcit). Ii js useful 
y9 helping to answrr such hejihh p!;pgram evalua*' 
lion questions as: ' 

houlH a new progra'm be launched (e.g., 
'^hopld we add a mobile rescue unit with' 
, trained FAlT's to an existing hospital em^r- 
'*gent\ service)? ' 

• Did we get oqr mone\'s>orth from a pro- 
gram that v\ds started last vear^ 

• Should we expa^nd. contract, (fr eliminate an 
existmg^program ^ * , ^ 

• Should we expand our emergencv medical 
progrc'UTi 'at th^ expense of another ertier- 

-genrr Wed leal program'' * — „ ^ 

• Shrf>lild we transfer resoui-ces from one 
- non-emergencv program to a particular 

emerge'ncv medieah program (e.g ^ should 
"the infrequentlv-used extra surgical suit^ be 
converted into an extra' ambula«^r\> care 
, unit — or should it be l^he other wa\ around)'-^ 

• Should we devote more of societv's re^urces 
tbcemcrgvncv medical services aftid less to 
otjier sociabundertaJkings? ^ • ^ 

It important lo decide tf the ^nation is ex 
ante^hcUnc a pr<)gram is Ufidertak^-n— or 
— a reir()specn\c*'anaUsis 

Economics is mosuiielpful in analvz;ng the ex,- 
ante liinding dec isioa 

, . Cost-ef/erttveness Andlysts This is ^antefficiencv 
criteridnr It avks. is thrt the jeast way to 

achieve a p^rti^ular effect*' ) . ' * ^ 

- Benefit'Qost Aruily<is ft askt. should {he .pro- 
gram be undenaken at all?THar iv, di> the* benefits 
outweigh the. costs' . . * * 

Benefit-.cost anaKsfs had fclur p^rts. 

1. Prediding the c'o n^equefJces of a 
program— that is\ as^ies^mg the prob- 
abilities. ^ , ^ - ' . 

2. Valuing^ the* consequences 'or output— that' 
^ • IS, measuring the benefits. • . , 



1 



.* The vicH* r|pr<«ird in thu paprr^'arf the auvhor'i »nd doiioi 
<*»Mariiv r*nr*«lnr Hio*c of RAND fff anv of«rti Corpor»<y Sp<}n' 



.* The vicH* f|p 
necewarilv reprcwn 

• V 



3. Assessing the'cosis of the pv^gram-? 

4. Selecting the best gliernative. 

■ J' ' ' ' : : 

II, Predicting the Qonsequ^nces. 

This^s usukllv best done with the aid of a deci- 
*sion tree and the yse of both o^)jeciive and subjec- 
tive probabilrties.** • \ 

Major points to remembij* in 2lssessing prob- 
abilities; ' < *^ ^ 

1 Host studies fin^ that initial probaHilitv dis- 

Vtriblitions are too narrow. SpreJ* time out: 
adrnit it wheuvod are uncertain! 
.2 Each pefson knows more abouf some topics 
_ ^ than other^ Don't spread the .distribution 
• t(x) far wh^n vou^do have a good basis for 
ji!idgment. ^ 

Make use of different experts for»differeni 
p^tts of the firoblei^/ ^ 
4. Most studies show thai groups of people are 
H^much mor^accurate t han a sijcigle assessor. 
Suggested addnK>nal r^a^^ Kaiffls booK is 




introduction to 
ilitv a&sessmentl* 



an extremej) fine; and rea 
how to be a practitioner of 

.Ill; Valuing trfe Benefits J ^ 

, , Three n^ajfor alternatives exist./ \ 
* 1 ) ^idence from'p!f)liticaJ p^(Kess * 
t 2> Livelihood SaviVig' (or Human Capital) 
measures ' • > 

3) Willingness-To* Pav (or Individual Prefer- 
^ ^ ence^j measure^. ; * . 

Principal criticisms aiitl comments about each 
include: ^ ^ * . i 

^1 Political Process: Tewj ciDnsistent pieces" of 
evidence on which to base evaluation. Im- 
plicit values range from a few hundred to 
♦ over ^ rhillion doltars per lif<f*sayed. 

2;. Livelihood .Saving: The most commonly-** 
usecllpechnique in past studies .Widely^ 



**Th« d«ci»ton tre* u'a dtiplay l€^fcj»qu« employed in decuion* 
lanaiyiif for df^itionmakinft undef uncertainty' Howard Rarffa/D^n- 
won Analym tnh^utfory DfcLnet on Choir es Undff Uncertainty, Rirad- 
ing. MaM. Addi»on Wetlfy. 1968 h|t a good mlroduttor^ kxwk The 
handout opatertal hat an application in J<in«Acton^ EvaCuatrng Public 
PfBgrami tp Savr'Ltvfs The Case of H^art /tttafks, 



so 



V • 



criticized because of ihe^iscriminatory 
treatment of women, retired persons, those 
f who do iiuf work, and**those who will* hot 
. ' reach working'age. * » 

,3., VVilhngness-to-pay measures are based on 
•the premise%that individual preferences 
'should count in ptogramsah^r .affect ' 
JJeople's lives and happiriiess. Some" work has 
beep done bas^on implicit valuations — for 
' . instance m extra ;^azard pav— hut cons^der- 
. able variabilitv is observed. Breliminar) evi- 
^dence suggests that people can rc^spond well 
to sur\e\-t\pe question anfl yield useful in- 
, . formation, but additional v\ork is needed. 

These alternatives are discussed and critized 
i^n detail in the attachment b\ Jan-^cton, Measur-' 
ing the Moneiarv Value of Lifesavingf Programs, 
P-3671. 



IV. Selecting the Best AUernative 

^ Majo'r points to r^meml^er^ • 

Don't use a benefu-^qosit ratio to choose Select 
the alternative \Mth the greatest net benefit 

Dofi't just select t-he, alternative with the great- 
-«*s4,/e<luitH>n in mortalitv rat^s Remember, 
change^g' rTi<)rtaIiu rates ma\ be more in^portan; 
for j»ome groups than for other groups of people 
' ^ Check for wn<uti\iiv lo assumptions and data 
used'" • ^ 

" .•^Vould the choice change if slighilv different 
measures of ^xnefit were used" } 

• Would.vthe choiccxhange if the probabilirres . 
were somewhat different? • 

• \\f)uld^the choice change if ih^ alternatives 
available arc' siightfv different? 

If ves ;o an\, question, t^rr trv to sharpen the 
data-or \alues used ^ ' - 

Check for omitted factors and variables that 
might tip the balance the other wa\. If the decision 
seems sensitive t(r these omitted elements, trv t^ 
incorporate them forrrJallv ip the anaU^isj^ 

.ILLUSTRATION of Decision Analysis apphed ' 
the evaluation of two new programs for an existing 
emergehcv service These assumptions are admit- 
tedlv arbitrarv ancf somewhat unrealistic, but iRev 
illustrate the rjiethodologv, 

Assumptions- • ' ' ,^ • , 

• Two programs are available' one for^ treat- 
ing heart attack victims, one for trauma vic- 
tims. t)nh one cari be selected Thev cannot 
be coYnbined * ^ ' ' - . , 

• the outputs of both programs consists 
" mainly in.reducing the mimber of peopl^ clV- 

ing either outputs are not importam. The • 
•program will apply to a fippwlation of lf>,000 
people. V. 
. • Both, programs r^educe the probabilitv of 
death bV d07c for thpse eligible people 
reached * ' ♦ •* 



• The probabilities of death, pf calling for the 
program, and .of being treated successfully 
are indepKrndent for each program. 

• Heart attack-and traurpa events occur inde- 
pendently. 

•The probability of calling the heart attack 
/ program, given a heart attack, is 50%. ^ ^ 

• The probability of calling the frauma pro- 
gram, given a trauma event, is 80%. ' 

• The heart attack program will be^ble to* 
reach ancl help 80% of those who call. - - 

•The heart attack program casts $1OO?,0O'O 
per year. 

• The trauma program^j^osis $70,000 per y^ar. 

• .The trauma population is younger and has a 
better prognosis if ' s^vW^by the progranj."^ 
In the range of expected effectiveness ex- ' 
pected, each perso^ is willing to pay an aver- 
age of $8 per year for each chance in 10,000 
that the program- reduces hiv.^^Jfce of 
death. • , . 

•The heart at taclif^ population^is somewhat 
older and h^s a worse prognosis "jf "saved" 
bv the pr^>gHim. In the expected range of ef* 
fectiveness^ each'pers^ is w^llIng to pay an 
ax-erage o4 75 per vear for each ^harfccin- 
10^000 that the program reduces his pro1)a- 
bilitv of devith ' . ** 

Conventions. * - * ^ 

We wil^ designate points*in the decisfon prW- 

ess where a choice muSt.be made with ^ square. 
Chance n/Kies are indicated bv ^a circle, . 
C^osts associated with action taken are indi-T 

caied bv a barrier acmss the pathwav 



uatlon, 



Figure 1 : Current Situatloh, No New Program 



ERIC 




jpiguise 2. Effect of Heart Attacl^gpgram ^ * figure 3: Effect of Trauma Program 



40 

Treated/ 



* Can 

Program/ yrelted^ 




Die 

of Trauraa* 



Live 




Aciopt 
Tt'au/na 
program 



0l6 

of.H^n Attack ' 



Figure 4. Decision Alternatfves. 



. ^0 krves Saved • '*S 



Adopt 
No New 
Program^ 



lO.ppo 


/ t 
n/' Adopt Heart Attack 




People 


|\ Prpgr^rr) 





♦ 8 Lives Saved 
Benefit k'$:5<X)bo©'' 



spo.ooo • • 



Adopt \ 
T?auma^ 
Program ' 



$70,000 . 



♦ 3 Lfyes si^ed„» 
B^^^H- $240,006 



:■ 9? 
.1 




Appropriateness and Feasibility of flandonrrized 
Fl^ld, Tests 

'Robert F. Boruch ^ . * ' . . 

Northwestern Umversit\'" 



• 4^ 



When It IS proposed \o prrsoris u-wr^^H^ I'^mm^s sen'ue dehverx systems tha^their programs should be evaluated by experimental 
methods, strong doubts are almost nivarihOj^ipressed about the' frasibility or even possMity oj ^pertments itt publit senm^ 
deliun^ s\s^ems Robert Boruch, an evalimthi methodologist, has identified^ hundreds oj expenrf^ts rirned out tu just such 
^^etiings Thts paper sdmrnanzes his eXpenerues and vieufi fi preset^s a ftrong rationaCe for n'aluatiori, an oveh'ieW of problems 
and methods involved m program n'aluatiou, and 'the case Jur conducting tandomiiefi experiment^ . * i 



f 



1. Introduction • 

t 

rhis paper re\iews bricflv wnal vve ha\c - 
le^irntd about .^ppropnaunt^ss m mounting fi^lci 

* experiments to plaa.and evaluate social programs 
and about the feasibility of such test^ "Appro- 
priateoess" isxoiisidered here as a-kind of precon- 
(iilion fop feasibihtv, one which exercised a direct 

* impact on the le\el and nature of a subsequeht 
reasibilit\ stud\, Feasibilitv here concei'ns those 
conditKvns which enhance or detract from the suc- 
cessful conduct of an experiment This 'discission . 

. depends hea\il\ on studies of eff()rts t'o foster the 
/us^e of randorni/ed tests of ^ro^^^iW^in 'field set- 
tings. We adherr-H* the following (^uihnc. 

2 Approprii^cness of F^luation and. m par- 
V ' ticular. of Rand(Jmi/ed Field Tests 
• 3 Historical Precedent as a Oc^ieral Lest oi 
* FeasibihtN V '/^ - ' 

4 Pdwt Feasibility f xpennjI^iV as a Fest^^^f ^ 
' Feasib^ht) ^^ /V ''^ 

3"Direct (>ofi^TfH»in! v <i^-^ie^f^i)><\ of Ran- 
domized Tests \ '/.'.^ 
fhe Bibliographv' attached pV^iJides s^t back- 
ground suppc/rt. i|n the forhi field tests actuajjj^ 
mounted, for the opinions oi^ fere d here, 
» . ' >^ 

Z/Approprlateneil^f Evaluation and, in 
Particular^ of Aandomized'Experiments • 

Scverar?^iiesti(uis ^eneralK need to be iin- 
sw ere d before an qp^eriment is considered jnuch 
>• \e^s mounted, Fhe an^^ers to them \erve nt)t onl\ 
Qs guides in deciding whertlitr and vyhat to^ 
evaluate, but also determine subsec]uent fe'a>i^i)it\ 
of afv experiment 'Those questions, discu'ssed.very , 
bjicflv m the following re'marks. incjudc 

2 fls thtre any interest in cvaluaU()n. Vnuch 

^ Jess an experimental test'" . ' 
2 2'Fs an irnpact evaluation rather than'some 
other type appropriate in* he/Sifltii>g at 



2 3 Are the effects of the pVogiani currently 
debatabi<^ * 

2,4 If s(*s. what is the proper. Mandard for an 
inipact e\^luationr' - " ' • 



2.5 V 



icth ods other than a randomi/ed ex- 



ERIC 



hand? 



9o[ 



perinierK suffice for impact OMunalion?, 

2.i Interest mtEviluajtion ' ^ « * . 

4f spon5><)rs ofAa. prp^^iB have^n interest iii 
.objaining a f'a'ir appng^al of a program's cTfcc ti\e- 
ness. relali\e to an\ standard jShen mounting an. 
e\*aluati(yn. rand(jm.i/cd or otTOrwiss^s consid- 
eral?lv more -feasible It is cft)ubtful-, f<^)r\xam'plc. 
that ('ardH" Fducatiori programs supportccKby the 
National Instiii>^e of F.ducarion woukl ha\x or 
(ould hay evaiuat'ed themseUes without ehcodr- 
fi^vmvni ' and demands made b\ the agenc\, Fllat- * 
^sponsor's support is an insuffidcnt interest soufie 
IS also clear fr()ni cases in w*hich despite spons 
dem'ands^ rigorous eVaU^a'fions have been suT 
\erted hv prrtgram*staFf* 

So prograiTiitaf^ and^dtvelop^r int^-revt is alsd 
d determinant 6f feasibility of any evaUiation- 
Reputable pTograrri developers will often agree 
thai an c;valuan()n is necess^rv as will program 
staff, fiiJt "assuring that th^^yiiepcst js not honofific 
altogether a diffcrfeftt patter. Some strategies 
f(xk a^iringfCKjperaOon viith staff must be worked 
oul bef(<>reha'nd.^ Some o* these are. discussed in 
Rieckc^n ct al. (1974)'an(^ in Section 5.*below ' * • 
Fhe client popijlaiion usuMiy has <onie veiled 
interest wn the outc^)me of an ^valuation And this 
,intcrcst is most often excmphfiect^Tn wa\s otjtij^r 
than active ( ollaborauori in the rigorous ^rtnvssJrA 
evaluation .Often, the ffifficutHcs in experiniertts 
turn around tHe .randomi^ationO process. Tactics 
for dcrcr.m^nmg and enhancing feasibility arc dis- 
'cussed in SectM)n 5 for the particular c^jse of cxn 
pcrimental e^luations # , 




88 



Mf there is no interest in a fair estimate of pro- ' 
gram quality from any of these quarters, then an 
evaluatioii experimental or otherwise, iyf likely lo 
be of litue^i^ tQ arvyone except the individual ^ 
conducting the evaluation. If there'is active oppos- 
ition froiU'one or more of these quarteVs/iiiatters 
become difficult indeed. . ' ' ^ \ 



2.2 Type of Evaluation 

Th® ^'evaluation of a progranl'^joften implies a 
disparate array of activities. And to avoid needless 
(infusion, ovk ought to recognize the^ legitimacy 
of several functional categories of e\aluation: 

evaluation of program objectives \ 

evaluation of program process or operations 

Evaluation of costtbenefit ratios 

evaluation of impact. 
Each of these is related to more elaborate 
taxonomies of evaluation activitv generated^ by 
Federal agencies (e g, I'.S AID) and especiallv bv 
academic researchers (e g.,Stak^). The taxonomies 
are a useful guide through ttic thicket of types and ' 
methods, biif w e ^fot us on onlv four hcr^ for 
simplicitv'5 sake ^ ^ ' ^ 

The first categorv, evaluation o^ program ()b- 
jectivfs. rnvolvesjJiiming political, ethical, or social 
Values to Jthc Announced goals of th^ program * 
I vpuallv, this tvpe of evaluation is tied to real or' 
imagine^l neeris Hf a target group: it is jmplicit in • 
most policv cft-velopmcnt and pKiiivy criticism; and 
i[ IS based on mJormation whuh pui"ports to shov\ 
tl>4t there'is a social ^^roblem and that a, program is 
one vnuv to ameliorate the difficiillv. . 

I he se(()iid categorv, evaKiation of prograin 
activities, invoWes .(ieferminmg wbeiher and how 
VM-II some well specified standard s^or implement- 
ing the })rogram are met Th>s class of activities is 
' often managerial in its o/icMita'tion. addressing 
questions su(h as* Is there a ckarlv specified 
produ( t being developed" is the pr(>cluct or *wvicc 
being offered to the proper target group'" U the 
product accepted and, if so. to what extent? How^ 
miich does the program sAsten^ cost " A 'second 
major perspective is also relevant here and l,s more 
technological m chacacier The expert program 
pracjitioner or judgt- may ask whether tSb pro- 
grams elements -and coj^uct are consistent with 
the state t)f the art in the relevani discipline, and 
w*hether there are anv remarkabFe inconsistencies, 
^onuniformitv.'or deviations, I he standards here 
are those of a discipline, firmer perhaps than the 
attachment of social or political values to program 
goals, thev focn.s on the immediate scientific com'-' 
^ mon sense 6f a program rather than on ultimate 
outcome, and that tcK) h important 

The third class of common activities is the 
cost/bcncHit analysis/'This covers a variety of sins, 
but most often involves assuming that there is in- 
deed a program benefit and assuming that the 
benefit has some value. The costs have been tradi- 



tiShafly a bit easier to pin doiy«. Th^ objective, 
given- thdse a'sseimptions, is to provide criterion for^ 
•deterrqrnin^^w scarce re'sources gught to be alFo 
dated when there are numerous competing, de- 
mands fbr those rq^ources. Again, whether the 

* program, does indeed hajit an effect is.gef||erally 
ass&meti,* taken for granta^ or judg^ed' relative to a 
tiaditiortal standard for ^ich there is some con- 
sensus. / ' *^ . - 

•The final form of evaluation and the one 
which interests us most here concerns the relatMCe 
effects of the progran»i.e. impact evaluation. It 
attempts to answer qu^Mons sucft as tlje following: 
Which of tvvo education prc^rams ^^phances stu- 
dent achrevernent or ability or attitude most? Does 
a new surgical ireatnt^^nt have fewer side effects 
than th'b^current. one? Which of several health 
educatiorr programs has^ the largest efTect on ac- 
' tual health status of individuals or cities or re- 
gions? In each case, one asks how the program, or 
service, or delivei >|mc)de, works with respect to 

• some standard or alternative. 

, Kach ca^egf)Fv of evaluation is legi^l^ate and 
important. And, of course, nothing prevents each 
tvpe from being conducted simultaneouslv . In- 
♦deed, most major program evaluations include fea- 
tures of edch tvpe. The first and third categories 
are generallv more feasiblCv t,han the second and 
the last. But the information thty provide differs 
in each case. Whether one or another.category is 
Tiiost appropriate depends heavily on the interest 
of the principal sponsor of an evaluation. 

-. •■ M. 

2.3 Evidenc^Vh Imjiact 

tf a program's effect on a target population is 
already known to be positive and its magnitude 
and cost are similarlv well established, there ap- 
pears to be very little point to conducting an im- 
. pac t evaluation, whether randomi/ed or not. 
Studies undertaken for strictly scientific reasons, 
rather than for the sake of policy plarming and 
development, are an exception and this case we 
put aside for the moment. 

In most instances the need for^ evaluation 
arises Because there is some hone^l^isagreement 
among experts about the nature ojA^n effect The 
lack' of agreement or even of infbfmecV opinion 
may stem from the fact that the prc)graa]&t^^ij*iX)m- 
pletelv novel one, as many innc)vative|fflcial pro- 
grams are advertised to be Or the dSagreement 
may stem from previous research wnich permits < 
only the most ecjuivocal of inferences about the na- i 
ture of a .treatment's effect. The Negative Income 
Fax Experiment, for example, w^ mounted be* 
cause regression, covari ajce. and c)ther correla- 
tional research tethniqu^sv^^e insufficient for 
supporting major policy decisions: the effects of 
various levels on inconie subsidy on work behavior 
and so on ccn^l not be predicted with sufficient 
accuracv or with a sufficientJv low level of am* 



94 



biguity. SimilarK, equivocal data accumulated over 
the past 15 years has led io<he development of the 
current national clinical trials. <o test ^xperimen- 
talK the effect of 5f)ecial diets and drugs on'ar- 
teriosclerosis. ' * \ - 

Disagreement here 'applies not 6nh to the 
program itself but also to the manner of its dcKv- 
er\. U IS well known: for example, that certain^nu- 
tntfonal supplements ha\^ positi\e and detectable 
effects on [)^\si(ai development of children. But 
hou to manirfacture. deli\er. a4ui entourage ac- , 
ceptante of such supplements among mai-^ 
nourished children io depressed regions is 'often 
not at all clear^ The agreement of judgt^s that e\i- 
defice on best methods (if dc^jver\ 1= scant\ ser\es 
as a justification for impact evaluatioil. indudmg 
randomi/ed tests and alternative methods of deli\- 
^ er\ and eruoxifagement 

SirvilaiK. disagreement nia\ occur about 
cc^piponents iL)f a program ratlier than about the 
t^oial prograni Alicmaiue meihods of screemnjj 
indniduals, of iraming ser\ue (ieliver\ staff, of re-> 
ferral servuc staff, or of program recipients, and 
so ma\ rH)t be central to a complex .program, but 
ma\ indeed warrant imf)act evaluation 

2.4 The Standard and Impact Evaluatioir 

Two kinds of stan<lards are periinent in, decid- 
ing what t\pe of impact evaluation is appropriate, 
and in senling on a randomi/e^i experiment' as the 
design of ch(Hce I he first kind (oncerns standard 
against whuh estiniajes (^f irhpaci should be 
judged I he setond (/)n(erns standards K>r judg- 
ing ihe c-(|Ui\ ocal.it V or bjas.in estimates oi pro- 
gram ef f e( ts , ' . 

Standard foY mof^nitudf ystimotion One can of- 
course choose a historual precedent lo gauge ihe . 
impact of treatment In the ideal case, one has, a* 
long stable lime series av4Waf)le. ihe progr.im^is in- 
troduc ed abrupih . and the progir.im eftect is 
gauged bv ^is effect t)n the iirne series I fiere ma\ 
be other suYnlarlv ideal empirical wa\s to spec if < 
nulJ conditions — how ^things are m ihe absence of 
* anv extraordinarv program effect, i hev niav in- 
clucfe nalurallv occurring, en^irel ,ecpnv alent 
comparison or coj^ol groups 

Or, the siancTIird against which effects are 
judged, the null condition, mav also be specified 
bv assumption or b\ flat. lt\ the former case, for 
' example, one might be willing to assume, based on 
theoFV. commonsensi;. or whaiever. that tfK're*vvill 
be absolutelv no impVovement in the condiimn of a 
mentalh retarded grouf) without a program In 
the laiter case^one might spec'ifv, as Nixon did/ 
that if a crime reduction of \()Vf occurs.^hen the* 
program (wheiher U is realU ui'thl' fleici or not) 
will be declare)! a success , 
.\c)w any oj these standards mav, irr paftic irlar 
Instances, be ciuite^ appropriate. Fftmigh mav be 
known from ihe.ori to spec if v ihe null condition- 



quite accurately. There i3ia> be sufficient theory 
and d^ta to specify the baseline stanchard well. And 
in some instances/ the use of these options is fine. 

The problem, however, is- that in social pro- 
gram evaluation, neither iheorv nor prior data fire 
su f f icien t "jOr spec.yj|yng null conditions 
ad^cjUatelv, for a^sHrm^pat the supposed stand-, 
^rd of comparispp is a fair one. Furthermore, even ' 
the theciry which does exist'mav be insufflcieiit for 
coping w ith the cooyji^^ig explanations^ for the 
findifig that an effect is^igniflcant. The 'effect 
|ound mavNjtem from mflueiices coinpletefy out- 
side the prograni. itwriav have been a continuation ^ 
ftf an unrecogiii/ed trend, and soon. 

Ihe randomized experiment - is. fn this con- 
text, most <ippropf«iite when null conditions can-^ 
not be prespecifled well from prior data." by .as- 
swrjiption. or bv flat I hat is. it -sets up a timely 
comparison group w hose ecpiiv alence to a treated 
group IS guaranteed in the long luri and wliich can 
be used as a fair <irici reasonable l)enchmark Jor es- 
timating progrUm effects fhe experiment also re- 
duces the* ecpiivocalitv probleyi notablv I hl- 
number anci pjausibjiiiv 'of competing expiajlatrt)ns • 
can be rechic ed . ' • 

Thf \tan(Lird for rqniv(i((iht\ oj irifrrenre The 
benefit •of ranch)nii/eci e\penmefit's is that if the\ 
are condiict^d pr^perlv , the judgements one tan 
make about, existence anci size' of effect iire less 
susceptible tw attack fliat is, other met hoc fs rna\ ^ 
produce a'n estimate of program niipac t whicfi. is 
stisc epti^)le Hf bias, due to unrec ogruzecf influ- 
incet. extraneous factors and so on There is a 
fine jitafe of tlie art* in ideriulving coihpeting ex- 
^plarrations for findings derned from f^b*»'Jrv aiion^d 
(noiii anciomi/,ed) r\aluations. and if will ript be 
"ciiscussed hefc^ See. for exaniple, X-ampbell 
Sianiev's ( l^^f)!)) classic riionograpli or a i\'\i^^^ 
eciition. Cook ind (>anipbell ( Mi7b) * ^] 

I Ift're exislV,- however, no formal tc*chriic|uo 
.for aiitMbmg a "le\=el cU ecjuiyoc alitv " ic) the firid-, 
ir>gs f roni cpiasi-experirnvnlal studies VVficlher 
such a svsfem could be drawn up depends heavilv 
on tlie particiilar.Mibsiantiv e area and op whether 
the cc)rn[K"ting explatiaiioris are plausible or realis- 
tic Kstablishuig the lenabiht'v of the last time, re- 
garding realism, brings us vt(r another criterion in 
t SI abl |sh I ri'g the appropriateness (and c on - 
sj^'CjUen^lv . feasibilitv) of ranclornized experimetil#f 
(.an methocis which do not rel\ o(i randcHiiized as- 
signment vietfl esiimales of ])rc)gTam effect w^iich 
are clc)se to th«i>e,which one tnight j}b|ain in an ex- 
permu'iit Some teniative ariSA\ers to the cjuesti'on 
arejgive-n in the nexf'sccM^)ru 

< 2.5 Possibly Suitable Al^rnative» to Randomizexl 

Jrials . ' 

I he l)asic idea here is that one ou^ht to dc*^ 
► termine if randomized rxprnmcyital tfUs are unne(e\- 
sary b%(ausr wc jni^Kt bf ^hle to use a variety; of (]uasi 



89 



90 



erJc 



experimental !and (or) glgebraic adjxistnwits to, obtain 
unbiased, estimates oj program effectf The^cx'dci tondi- 
tioiik under uhith a randomized experiment will 
Vield tife same eStinia'ie of program effect as a 
nonexperimeni *are, m principle, specifiable be- 
forehand. H(me\er, determining whether those 
conditions are actuallv met in the field is usually 
difficulx and often impossible. One simply dx)es 
no> know whertfer ihe .an-alyiical conditions as-' 
sumed for the nonrandomized e\'^luation and 
analysis hold in realii>. C(^isequently, many such - 
e\'^ruationC^antioi be used to support contentions 
aqout program imp*act. That the probleril is a per- 
.sistant one is evident from reviews bv Wargo and ^ 
his colleagues^ ( I 97 I ) and bvfBernstein and 
Freerhan (1975) of e\iluations of Federallv- ^ 
subsidized social programs: In the majority of the 
'nonrandomized evaluations^ihere were competing 
explanations for the findings, fxplanations which 
could r\c)t be riiltj^d out on common sense grounds 
or on the"*basis ()| the empirical data collected iti 
tbe cvaljjatKms ' 
\ To get an empirical fix on the m4tter we can 
ir\ an approach geared chiefl\ toward understand- 
ing the lim'its of statistical manipulation. Here, one 
locates '(or conducts) a rartriwmized experimental 
\ test of a prog^riUn, aifd in addition, collects suffi- 
cient nonraudomi/eci data to support osrensibK 
appropriate c^uasi-c\|:icrimental assessment of the 
same program. Suppose^, for example, that daja 
are obtained on indiViciuals who have been ran-, 
domlv ^signed 'either to a treatment pro^jfam ('F) 
or to^^ontrol coadMLJon (C) Similar data are also 
collecied on an adcjitiohal group ((!*) whose mem- 
bers. ihoi>gh not ran(iomj\ assigned, are regarded 
as members of the (. group, .ind' ttf the I group 
prior to treatment, I he question is then posed 
How does the estmiate,^)f program effect "based on 
ordinary ^n^i^sis of variance of the I -( group 
compare with an esrimaje of effect based on 'the 
I T-C* groups aiuTc/mventional s^atisj^al technic^u^*s 
^such as' matfhin^ covaridnce afialvsis, or change 
BIcores analvses? I he answer is i^mportant m'sofar 
as'^it helps us to iinder'jtand the nature afid direc- 
tion of bia< that ma\ be (obtained when using tech- 
niques such as co\ariance anaUsis purportedK ^ 
vield unbiased estimates of effect 'without ran- 
dom^Z4t^c)n 

That *estimaies of effect will often (but not al- 
. wavy) be biased if we relvsolelv on nonexperiment'- 
t'al evidence becomes obvious with some cciticrete 
e)hnTiple&, (>onsider the simfxl,est form of nor^V 
expmrnent^l analysis — comparmg the^ cofrdition 
• of program recipients beforciiie programjA ^ro; ^" 
dtjctton to their condition afterv^ard Th^ be^^^^ 
after (or ' pfetest-post-testi approach is common 
despite the fact Ihaf^ny mrrease or decrease in 
average condition mav tee entirely attri'Kutanle to ^ 
uhrecognixed growth or development processes. 



In the Mr^iga^ artf^itis studies^ for example, 



I 



severity ot condition ^creased after the inlro- 
du-ction of ai). arthritis treatinent program. 
B?sed on this information aFone, we might er- 
roneously conclude that the programV effect 
was negative, i.e., it actually ba^rmed program 
' participants. In f^ct, we know from ran-* 
domized exf)erimental tests that the equivalent 
control group's cbilplitibri deteriorated even 
further^ and consequently, the proper infer- 
ence is thSt the program did indeed have a 
^ beneficial effect. (See Deniston & Rosensto^ck, 
1972). . % 

in before-aftef ev;aluaiioiis of compensatory 
rducation- programs, cognitive scores may in- 

^ ci^ease, "decrease, c*r remam stable. The change 
tells us virtually oothing about the program 
impact simpiv because we usij^lly^do not know . 

* for the subgroup tested and fof the particular 
test what the change >ould have been in the 
absence of the program, (S^^e Wargo et al., 
1971.) 

Usuallj^one'attempts to find a companion group 
against which to gauge the condition of J^.)gram 
participants, and also to reduce the eqflivocahtv 
underlving most before-after designs. But this is- 
also ha?arclous^ to the extent fKat the y)niparison 
group differs svsrematically and often in unknow; . 
able ways from the participant group : \ 

For^xample, one facet of the Salk vacx:ine 
trials in\()ivect comparing vc^lunteer' vaccine*, 
recipients, to ail^allegectt,y equjv3lent, '*naturar* * 
. comparison group of nonrecipiems. The vac- 
cine's efftnt in this ncmraHdc^nii/eiK^uasi- 
experiment was positive, ^tn est im at CT^^ based 

^'-on a second facet of the trials — f^anciom>/e<C 
tesi*— gave estimates o^ effect which ^^re 14% 
'higher than .the value b%e^i^ nn the nonraia- 
dom tests, (iiven onl\ ^e eydence frc^m the 
no n random groups tfien, we would have con- 
cluded that the vaccine notably /^'s^effec- 
tive tf^an it actuallv' wa^'in reducing. polio-iim- 

^ ^ence (Meier, 1972).' «• • , 

In randomized tests of a retardation rebahili- 
. \ latiori^^ogram* Hcber.etal. ( 1 972)^ of lex ted 
data on an 'additional [)lausibly equivalent 
Comparison group — siblings of children ertfoU 
. ' led in the progranl /Qie different e in ^'ob- 
served IQ b^twepn program parfic^jpams and ^ 
' nonparticipant^ tn the randcmii/tcl ^^1;^ was 
, abc)m 36 points A ccimpaiison of^prograin re-* 
cipients against their sjblmg.s (an ostensibly' 
equiv^ent contrast group) vieldrd a 4p-point 
fTifferenc^e Had we relied'soltjly on the "natu- 
raJ" comparison j^roup, we woJild have overes- 
timated the. program's impac t in.this mstance. 

At thjs point, the statistically knpvllecigeable aqci 
critical reader might observe that there are algebtv 
raic techniques which puVpof f edly^'adji^t (")ut ' • 
differences be^veen groups and which (;q^ate 



groups which differ initially, in oi^der to avcwd 
biases suth as these. The techni^ques — n)atching 
prograrTHparti'ti pants and nonpartujpants with re- 
spect to theit (^eIT>ogra^hic or other characteristics, 
covanance ()r\aggre§sk)h anahsis — are sophii^ti- 
cated but d« re(|Uire strong assumptions about tlie 
underlving natufV^ of the data, "More importantiv, 
those assumption's mav not be aaade(|uat.e picture , 
ol realitv, i.e.* of how ind?viduals will behave in the 
absence' ()f^siiflVf)rogram inf^c*rvention To be spe- 
cific, when groups differ initlalK and the differ- 
■ ence per^sists, then these methods will not perform 
.adequateh if the matchijjg variables or covariates 
are measur^ imperfectK or intompleteK . Some 
of the more advanced technicjues accorflfmodate' 
the problem of fallible measures reasonabh well, 
provided that reliabrlitv ()f the data is not too low 
(eg, Porter, )967). But none. ac( ommodates the 
specification probfefn satisfactorily, in man\ cases, 
we iire \er\ likeJy to leave put variables which a^e 
important but which are unmeasured or un- 
measurable In either case, the adjustment process 
IS imperfect,, and estimates of prograqa effect will 
often be biased How often will ;hey"Ee biased' It 
IS impossible to sav, but a- few examples rn^M help 
to illustrate the problc'm 

In t4ie Michigan Arthritis Study, a (^)nTparis()n 
grt)up was identified, differences Wrween this 
gioup and program pJirticipants were reduced 
bv matihing individuals, arid estimated of pro- 
gram effect obtained J he estimates of effect 
based on jhis C()mparison-is near zero, that is, 
despite selcttion of a .matched group, the es- 
^ - 'timate obtained bv (omparing the^e individu- 
als'to the prograni participants is biased, rela- 
tive to che estimate; ^obtained from the com- 
plete Iv randomized data <T)eni.ston 
Rosensto(k, 1972) 

riie Middlestart program w as designeci- b\ 
Vinger, liteda, and L'aycock (1967) as a special 
-pre-( ollege program for promising high- 
school students In their original evaluation, 
some students were assigned randomlv to par- 
tKipant and control groups Others were as-/ 
V* .signed on the basis of post,-fa(to matching" , 
That IS, five sets ni treatment and comparison 
groups were constructed, the\ were not ran- ' 
domi/ed and were equivalent onlv in the sense 
that the\ were matched on the ba*is of tbeir 
. demographic characteristics, I f .one examines 
the pooled data, one finds a signiflcanr cJiffer- 
ence of about six months in grade equivalent 
achievement tesA scores between participants 
a'nd n on pa r tie Vp a n t s However, it one 
examines t)nly the randomi/ed set of students, 
*lhe estimate is far lowec and quite negligible ' 
*In this case, t»he nonrandomized comparlsoas 
\ lelci^stimaftes of eff ectVanging from zero to a 
two-year difference in achievement test scores 
(Boruch, Magidson" Davis, 1976). 



I'lme-serfe^ designs are also^a promising approach^/* ' 
to estimating program impacl HVie cAie observes' 
some outcome variable over time (e.g.,^ rape rate 
'over the last three years), introduces the program, 
►and then' tries to d^etec,t subsequent change in the" 
variable (e.g,; a dr()p in incidence of raj^J. The ^ 
time-series approach is- promising to the ^ex-trnt 
that there is no good competing explanation for 
the change in the outcome variable, such aS 
ch^inges in the accurac\'of measiiring the inci- 
dence x)f rape, and to the extent that the tihie^ 
series is -suitable, so (hat a discontinuity will be ob- 
-»vious if u occurs Chat' time-serie^, analysis is often ' 
not possible and that it will often *rield estimates 
which dijfer from those based' on, experimental 
evidence is also clear, however. 

C.onsidering the Clali (Colombia) evaluation of' 
nutrition and education programs,' v\ e find ' 
that an otimate'of prograjn effect based on 
short time-series projec tion f rum the corxtrol 
group IS biased downvvard drasticalU The 
time-seri<-*s estimate of effect on 'ch ildren's 
cognitive skills is half the si/e of the effect 
based on test scores o^rand^mixed recipient 
and lionrecipient -groups The bias would-be 
smaller if a much longer time-series had been ' 
available (see McKa*^ McKa\, 8c Sinnesterra, 
1973) ' 

Time-series data oi^* polio incKlence prior to 
'the Salk trials were jnsufficientK valid and 
• c omprehensiv^c to support credible time-series 
estimates of the v^ccifie's impact .Similarly, ^ 
novel program'? such as the C>areer Education 
Projects supported by the National Institute 'of 
Lducatio.n, the* Headst*art variations efforts of 
the U S Office of Fducation, and others could 
not be evaluated on the basis of <jtime-series 
analysis simply because valid, stable time-, 
series data on iftiportant outcome variable^ is 
• unav^ailaHc 

In the Michi^E^an Arthritis ^tiidv, time-series 
estimates of effect were 10% higher than esij- 
mates based on randomi/ed experimental test^* 
in the same populations 

Of ic^uise, there have been st udies 'employ ing 
much less competent methodologv than even thc;^ 
imperfect ones we have descriljt-d vvhich have also 
led to erroneous conclusiorts The more dramatic 
examples have occurred yi medicine, vvhere medi- 
cal or surgical remed;es, adopted on the basis of 
\ very weak evidence, have been found tod)e of no ^ 
use at best and to be damaging to the patient ^t 
worst ^ 

I he so-called fro/en stomach apgrmifli to 
surgical treatment of ciuocienal^TTTcersr lor 
example, was useciLjjv a variety of physicians 
who simply imitated jhe technicjue of an ex- 
pert surgeon Lajer experimental tests showec^ 
pjl^ognoses were good 'simply because the 



91 



ERIC 



9r 



originating surgeon was good, at surgery and 
not because his innovation was effective. It 
providecf no benefit over conventional Surgery 
(Ruffenet al., 1969). ^ 'J * , ^ 

Anticoagulant drug treatment of stroke vic- 
tims had prior to 1970 received Considerable 
endorsement by physicians wiib relied solely , 
on informal observational data for their opin- 
• ions. Suhs^*quent rancfomizecT experimental 
tests shoued not only that a class of such drugs^ 
had no detectable positive ef fects but that they 
^ could be damaging lo the patiems' health [see 
Hill et al.. (1960) arTd oth^r examples described 
. in iRutstein's (1969) excellent article]. - | '[ 

None of this should be taken to mean that esti- 
mates of .program impact based on experiments 
will always differ in magnitude from tho^e based ^ 
on nonrandomized asse^smentsr The estimators 
will be close, for exampte, if there is no systematic 
difference between characteristics of fhe individu- 
-als assigne'd to one program varfhtion and> those 
assigned to another If. in a particular researc^h 
project there is no systematic association — 9.e., 
there IS a iTind of^natutal randomization 
process — or if such differences can be removed 
statisticallv , then w'e may expect various tvpes of 
designs to produce similar results. 

We have been able to document few insta/ices 
of this occurrence. hc:H,\ever. The first stage of ^ 
Daniels's evaluation of the DANN Mental Health ) 
► program, for instance, involved allocation of in-' 
coming* patients to tlje experimental treatment 
ward on thr basis oi number, of beds a^ilable. in 
each C^wtrolled (deliberate) randomization was 
introduced after ward tu'rnover rate had stabilized. 
Comparisons of the cha^acteristijcs of ward en*' y 
trants prior to their treatment in the first nonran- 
domized stage to the characteristics of entrants ' 
admitted in the second ^deliberately randomized) • 
s^age showed no important measurable differenced • 
between the groups. More importantly, separate * 
analyses of the nonrandomized and randomized- 
groups yielded *very similar "estimates of program 
effect. 

An essential cxindition for similarity bf^esti- 
m^tes IS that prior to program introduction, there 
be no systematic association between characteris- * 
tics of eligible program candidates and their par- 
ticipation in the program. The assQciatlop.may be 
slight'^ough at times tto give us «ome confidence 
that^he program effect is in the proper direction 
everf if we recognize that the magnitude of the es- 
timator is likely to be in error. Holt's (1974) 
evaluative studies of sentence reduction in prisons 
is informative in this resjpect. A number of non- 
randomized studies on early versus late release of 
individuals from prison suggestecl that length of 
sentence (within certain limits)^ had no impact on 
post-prison behavior. Later randomized, experi- 
mental tests demonstrated that t'he direction bu^ 



not the magnitude of the early estimates of the ef- 
fect of early release were appropriate. 
' In' each of these cases, as ill others ^(see 
.Bprtich, 1975), randomized tests were needed to 
verify that unobservecf influences were ngt wtirely 
^responsible for the' results obtained in nonexpert- 
^mental sttidies. More specifically, the Daniels ex- 
periment helped to rule out the possibility that 
program effects estimated from the nonexperi- 
mental data of the first stage were^tributab!e')tb 
subtle differences^in patients assigned to each 
ward rather than'to the ward program itself. In 
the Holt work, the experiment helped to demon- 
strate that the success of early releases was not en- 
tirely attributable simply to very expert judgments 
by parole boards about the li k^lihood of a 
.parolee's returning to prison, but that the length 
of sentence acltflilly has^po discernible effect on 
recidivism within certain limits. 

Remarks. \\ is clear that in some nor>ran- 
dcmiized evaluations attempts to statistically"^- 
, just out" preexisting ^differences between treat-' 
ment and nonequivalent comparison groups can 
lead to biased estimates of the treatment effect. 
The direction of these s'tatistical bia^s in certain 
stereotypical cases can be such tf^ine treatment 
will appear to have had a negative effect. Biases of 
this sort prob'at)ly underlie spme evaluated' decla-* 
rations that Headstart programs and ^M^rl powder 
•Development and Training Act Programs had a 
detrimental effect on program participants. Some 
of the conditions under which the statistical biases 
rpay appear are describecT, along with examples, by 
Campbell arid Boruch (1975) and Bofuch (1975). 
To better^gauge the extent to which new statisti<fal 
approaches to analyzing nonrandomized data ac- 
tually avo^d this prpblem, the Project on*Sec- 
ondary Analysis (Boruch, Wortman, & I^tGrkcie, 
1 975) is applying competing methods of analysis to 
the same data set arid documenting the biasQ^tin- 
derlying each method, V^*r* 
* Other researchers are conducting investiga- 
tions along related but distinctive lines. That re-, 
search is often supported by special divisions 
Within Federal agencies — NIE's Program on Meas- 
urement and Methodology (Porter, 1975), HEW*s 
, National '"CeiUer .for Health Services Research 



smer 

AID'S Division of Methodology (Technical Assist- 
ance Bureau) — which are designed to foster 
methodological investigations and which should 
help to identify approaches to evaluation which 
have far fejv^r technical weaknesses and 'greater 
flexibility in the field than those currently avail- 
able. 

' 3. HIstorlbal Precedent as a General Test of 
Feasibility ' ^ - 

4 The idea tjiat experiments are an ideal but 
itnpractical method ior estimating relative pro- 



gram effects is ofte/Tiproposed, But the c^tention 



98 ^ 



<ibc)Ut imp|ac*iiC4Jit) is* r^re1\ 'supported with anV 
e\idente oranaUsis, In tact, just a little homework 
Kan \Md a gtH)d dejrt of mtorniation aboyt e\- 
pennieHtjj tests uhith ha\e been * mounted And^ 
thai inforTnatK)n can be iised ai least as contextual 
or backgrt)und evidence ft)r making a crude 
judgement about *ieasibUit\ of experimental tests^ 
on the program -at hand , ' 

I he BibJiograph\ ot this paper. ft)r example. 
ciMitains excerpts from a hsi oi more thaii^200 ex- 
pOniTiems (Boruc'h. 1974) and illustrates the re-^ 
markable \atiet\*ot social programs aa hu h ha\e 
.been subjected to experimental fteld^ test In the 
economic arena (Section IX). (or instance, the 
NVgati\e IncoPHe /Tax cxperiwient^. the Hdusiii^ 
^Allouance tApermient. and the Ht*aith Irtsurance 
Experiments represent remarkable ettt)rts to de- 
termine the best ot alternAti\e et/)iu)mic subsid\ 
plans IheRt* ha\e been dramatic judicial expen- 
fnents (Section II) which demonstrate ihe teasibil- 
itVot raiidomi/ed appraisals ot the ettettm'Mess oY 
chaiiiJeH 111 juduial rules and practices Kxperi- 
ments hav^* been successtulK nu)iinted to assess 
jihf ettecK ot police training programs (Section ^I). 
retiabilii'ation prot^rams t<>r )u\c^nile and adult ot- 
tdhders (Section I. II. XII). and programmaiic de- 
c lions tid\e been based on the results of these 
lo-medical .(Sc uion \J ) anci mental rehabilita- 
'lion-* experiments (Sectio^'III) are represemed 
here and abroad hducaiional experiments are 
'(^ua^' common, and although most are rather 
^nicill. itie'Cali (Colombia) experiments on coju- 
penNaior\ education toi nutntK)nall\ deprned 
children, the |esearc«h on 'Sesame Street"' atid 
I tie t.U\iric ^ompain * ii^i tele\ ision-based educa- 
-lion and ai least a <ii)/en others iinohe si/able 
samples, complex programs, and tiigh-qualit\ m- 
jestii^aiion (Sc-ciion I\') there have been a large 
numt)eiyof exyeriments tonducted to identit\ 
superior niethocis of assuring (]uaht\ and com- 
pleteness of m formation iransn^ission in audits 
and'sur\e\s iSectKwi \ I), most ha^ tx^en desigjied 
in itie br(')ader ion text ot Federal data-tollec tioq 
eftoits. Xid i^ie\ pro\i(1e good e\idence forVhc>o^- 
mg-*.cKs i*o. accompli sti part- of that mission. Bc*- 
CcUise^s^)nie expermients \\hich take place in indus- 
trial settings are^ele\ant to groups often targt^ted 
» for social programs (the 'aged, the poor), illustra- 
t,i\e experljnents m this context ha\e also been in- 
cluded (Section VI ID ^ ' 

txpenments \ar\ in other \Na\s ScMne exper- 
imOnls. tor example. ha\e betiri ccMiducted to esti- 
mate the impact V)t impt)rtant. \er\ ^mall elements 
of a \er\ complex treatment, eg. lahorator\ re- 
search on tht* most ettetfhe si/e of letters and 
nunit)ers' III tele\ision broadcasts was ct)nducted 
prior t^) large-5cale e\aluatH)n of liie more com- 
plex |otal Klectric C*onipan\ program Others, like 
the Negative Income lax txperime^t. involve 
more simple and homogenous "programs ' — the 

o ; 



provisic^n of income sub^idv. the acTministra^ion of 
a rule, etc There is- a surprising variet\ in the 
target of randomrzed assignment: chifdren. in as- 
sessqients of mans educa^on progranis; adults, in 
all sut)stantive program categories': familie*. in 
economic experiments, n^ighborhocxls. in fertilitv . 
control and communications experiments, hospi- 
tals, school districts, and others. Manv of the ref- 
erences cited in tfie fiibliographv reporlbd m anl\ 
one experimental test m a series of 'Simultaneous 
replicationiAs in the Negative Income Tax Exper- 
iments (in Wisconsin, Nev% Jersev . Indiana, and 
Oolt)rado) .\ -series mav consist of a sec|uence of 
experiments and quasi-experiments. dedicated to 
long-range developmenj. testing, and rirvision of a 
pr(?gram The GoodvN m-Sanders (1972)^v\ork 
exemplifies this last strategv. it involved sequential 
asses^m^~nts of tape-plavmg devices for education. 

• usec^ on school buses enroifte to the children's 

hom'fe^ • " 

t 

Some experiment^ h a*v e not been im- 

* pleriiented completelv . c)f course For exampfe. the 
Hornik et al ( 1 ^^73) ^sse^sment ot television edu- 
cation in El Salvador v>as designed in part as a 
randomi/ed experiment, but the randomizaticin 
procedure tailed in the face of v>hat appear to 
have been insurmt)untable administrative difficul- 
ties in the evaluation Similarlv. efforts to.con^luct 
randoHMZed tests h^ve at times been unsuccessful 
in assessments of delmqueiu v programs (Cllarke .Ivr 
Cornish. 1969). education prt)grams (ON\erts et al .. 
1974). and elsev\here. Still. m3nv experiments 

■ have been nu)Untec?Vsucce.S5fuHv bv^designmg the 
studv to acccMnmt)date. political and social facers 
whth might (nherwise undermine randomization 
and valid measurement: tor example, the Manhat- 
.tan Bail Bond experiments, which conflicted Vvith 

^he vested iYiterests of^bail bondsmen? experimen- 
tal tests t)f i^rse-practitioper programs bv *Sacket. 

(1973) v\hich ccmflicted v\ith the intVfests of some 
phvsicians. and.pthers. In facl. mc^st ot the exper- 
iments listed in Biblic^graphv did accomplish 
planned randt)mization 

Outright failures t)f randomization undoiib- 
tedlv (Hc^r moie frequentlv ttftn the Bibliograph\ 
•^u^gests. and. of course, the reasons for failure are 
important The onlv svstematic analvsis of those 
reasons available so far. however is Ccmner's 

(1974) set t)f ciise studies and our ov\n analvsis 
C-onner identified the direitness c^f the evaluatc^r's 
role in the randomiz'atron process as a kev ingre- 
dient of success Oth^r ingredients are ihiportant. 
but the current scarcitv of dcKumentation on fail- 
ures, aside from evidence provided here, makes 
ident^fitajKMi of reasons foi»-tailure difficult Hor- 
nick and others have di^plaved an exceptional v\ill- 
ingness t(^ examine the reasons for unsuccessful 
randomization, and tc) build on tha*t information to 
cievelop better methods of analvzing the resuUant 
observational data 



93 



93 



Given the number, quality, and variety of field 
experiments which we have been able to identify; 
the general contention that expneriments are fm""- 
lyactical is a bit underwhelming. There are, how- 
ever, some other important feasibility issues which 
havc^ also been us'ed to justify not randomizinjj. 
The more typical ones are outlined in the other 
sections of this paper. , ' 

Remarks. That a notable number of ran- 
domwed experiments have been mounted Sn the 
field does not demonstrate tlie feasibility of ex- 
perimental tests under all or even most social\c6n- 
ditions, of course. The examples do, however, 
ser\e as valid evidence against the broad conten- 
tion that rigorous appraisals of the effect of a so- 
cial program are rare or impossjble. They ajso 
serve as a basis for examining conditions under 
winch controlled tcsth appear to be most readily 
mT)unted. For.examplr, many such tests cofrnpare 
the effects of various material products, such as 
two different incom'e subsidy plans, different 
drugs, different sets of written instructions, and so 
on, rather than the, effects of social programs 
s\hjch are based heaviU on personal skills of pro- 
gfam staff, such. as two rehabilitation programs for 
the mentalh ill It is conceivable that experimental 
tests of the latter sort are more difficult to conduct 
because we do not know enough about designing 
fests wiiich are especiallv sensitive to staff skills or 
v\hi€h do not threaten the status t)f program staff. 
Similarh, many experiments involve estimating 
the effects Qf n^'u- social programs, while relatively 
fe\\ are devoj^ed to ongoing programs. Tkat strong 
traditions, beliefsAan^d ingrained practice common 
to ongoing programs are less conducive to planned 
evaluations, has been recognized by legal re- 
searchers (e.g . Hans Zeisel), by medical research- 
ers*(e.g , Thomas Chalmers), artd others. But this 
is not to sav that experimental tests of less material 
programs, or of ongoing programs are lindesirable 
or impossible. It is- to say that Considerably more 
effort mus* be expended in mounting experimen- 
tal tests of ongoing programs and;th4t the efforts 
may not pray off fn a successfu^ test if reg^ular pro- 
gram staff resist the idea of "evaluation. < * - 

A different .reason for failure of an experi- 
ment concerns the public's rejection 6f an unfamil- 
iar ideaT— randomization. Some good experimental 
tests have been undcr^mined by premature ^nd 
naive acceptance of randomization as well as bv 
vpremature and naive rejection. Public education is 
likely to hqjp make jcceptance more informed. 
But in^addition, some ertipirii^l work by program 
evalU|ftors on. related determinants of acpptability 
c^'n be justified. Hendricks and Wortm*n (1974), 
for example, are examining the effects of a pro-, 
gram candidate's assigning himself random'^y to' 
program conditi'on, because asslgrunent by pro-, 
gram staff or by art imper^nal institution appears 
at times to geijerate resistance. These ^mall labora- 



tory experiments and case studies such as Conner's 
(1974) are likely, to be helpful in generating more 
realistic appivoaches to handling the problem in 
the actual fi^ld experiment. 

,We consider the matter of randomization in 
more detail in a later section of this paper.^ 

4^ Pilot Feasibility Experiments as a Test of 
FeasiWnty \ " " 

The suggestion just made, that •examining 
precedent can be helpfuj in making crude^judg- 
ments about the feasibility of ar^experiment, is a 
reasonable qne. But it is considerably less direct an 
approach than is generally necessary. Now one 
relatively uncommon but quite direct approach to 
the matter is to mount a live pilot experiment, a. 
little field test to appraise feasibility of^the Cull- 
blown fiel^ experiment. 

Such a pilot feasibility study may be^ unified 
endeavor as* we've implied, a dress rehearsal be- 
fore a live but v^ry limited audience prior to the 
main test. This is not a common tactic in the sodal 
scie^ices where the exuberance of a youthful sci- • 
ence and ^ort time framt aiay prevent its more 
frequent 'use. But it is not uncommpn in other 
arenas, including medical experimentation. The 
more common approach, of course, isao sej[^Hp a 
^number of small tests or studies prior to the'^li^in 
study* to assure feasibility of special features of a''" . 
field test. That is ^most always done as a part of 
the natural process of program development, and 
It is Kithout *doubt essenlial. But the Tnore frag- 
mented process assdres that the separate ingre- 
dients of an experiment may be of sufficient qual- 
ity, but usually tells us littFe about the resultinjg 
nnix(*ure. ^ 

To be more concrete, consider what a f>ilot 
feasibility experiment may te-U us about probletm * 
which can (and do) occur in major field expert 
ments. The chronic problems, judging from pre- 
cedent, bear on: the target population; the re- 
sp^onse variable; the treatment delivery; and ran- 
domization. Except for ihe last, difficulties with 
each item has surfaced in most progr^ evalua- 
tions, rar^domized or otherwise. . , ' 

4.1 Target Population « \ « 

The chro/iic problem here is that members of 
the target population, those indi^viduals or instit^l- 
tions which are supposed to avail themselves of a 
novel program, are not well identified. 'Vhat is, one 

■ usually hi^va general idea of who might bejn- 
terested, deserving, and so on, but prior to a major 
field experiment* It's usually not at all clear how 

^one is supposed to identify those individuals 
quickly, screen them, involve them in the research, 
and so on. j * , 

So^ for example, a "need" is declare^l, a pro- 

: gram developed, and field test mounted without 

^ knowing exactly who is rteedy and)how to^et at 



100 



them The so-calkd -career education programs 
were beset bv'ihis p#i<)blem tor at least two vears 
before realK coming to grips with it. The absence 
of an\ hard information on which adolescents * 
• were need\ or even interpstedjn career education, 
^jfricuiiv of selling up a good svstem for referring 
*ad(]iescenij§*..(rom their Jocal. high school to an as 
\ei untested and poorlv Understood novel pro- 
gram, generated problems m assuring a decent 
. sample size for the ir^tmen^ groups'much less for 
the control condition 

Exactly the same kind of. problem occurred at 
ft aboui half the sites of the Section 222 experi- 
ments. These admirable tests ran straight mto the 
problem of recruiting and selecting individuals for 
trfatmeni. i.e da\ fare, because the size and na- 
ture of the relevant local target population was not 
sufficientlv well known, referral services for the 
^ new program fiad to be set up with great effort 
' since-neuher phvsicians nor hospital dischargewof- 

fic'ers were know ledgea^ble about either the new - 
^ program nor the fundamental need for randomi- 
zation The problem appear^ to have stabilized 
during the first vear of the experiment's-conduct 

Remarks One of the best conditions under 
which a randomized experiment can be established 
IS one in which the demand for services, the 
number of membejs of the eligible and interested 
target population, greath exceed the suppiv of 
ireatmerx facilities With a new^small program^the 
latter condition is often met naturally But the 
former condition can onlv be known through 
needs assessment survevs or through pilot tests of 
the kind suggested here The market needs to be 
identified well before the expenmept and to be 
expa^nded where necessarv to enhance the feasibil- 
iiv of a randomized trial 
\ 

4.2 Response Variables # 

'fn. the behavioral and social sciences at anv 
Vate. the character of a dependent variable, espe- 
ciallv a newlv develgfSed test or rating svstem. is 
often insufficieutiv documented prior to a major 
experim.erwal test. The problem is chronic and, 
more importanjK critical in fair /estimation of 
program effects In brief, jtbe response variables 
relevance to the treatitient prtigram is often quite 
low. .despite its '/face validity" And it is through 
research J)rior to the mai n. experiment that the 
most direct evidence can be obtained, that the best 
systems for assuring relevance can be set tip 

For example, standardized achieyement tests 
hrave aften been used as a response variable in ap- 
praising the impact of compensatory education 
programs^ But. ip faO", many snch programs cfo 
not^ focus on academic achievement of deprived 
students even when they are supposed to do so. / 
Rven ^^hc^^ they do. students \ n the needy category 
ofiej) perform so poorly^ that the test is simply in- 
s^nsjiive to their true level of achievement and to 



changes in that level. To be sure, the t<;st result 
ma\ also be affected not,ably by local testing condi- 
tions which produce af^xiety, apprehension, or 
confusion among students, Jac t^r-s which are 
bound to depress test scpres generally. Similarlv, 
in health-related prDgrams, measures of (say^ 
functional mobility of the aged or arthritic may be 
guite reliableVheD made with well trained raters. 
But in the field where concjitions of measurement 
are not ideal, even the well trained may yielcl rat- 
ings which contain a good deal of random varia- 
tion or systematic irrelevance. And if the program 
Itself directs only a little attention to improving 
functional mobility, then unreliability will makV 
the subtle effect difficult or impossible to detect 

Now aside from i|ie normal precautions to as- 
sure reliability of measurement and relevance of 
the resporise variable, which incidentally are often 
not taken, a. pilot feasibility test appears to*be a 
deceni approach to accommodating the problem. 
Prior to the main field test one obtains all the evi- 
dence one c^ on the sensitivity of the measures 
Ajnd the test^ should help one 'understand the kinds 
of qualitv-control devices and record management 
tactics which should be emplpved in the main 
studv to assure the integritv of the dat^ 

4.3 Treatmeirt Delivery ^ 

If the main field experiment directs attention 
to impacD wher^ the program is delivered, ir is nat- 
ural to focus a pilot field test on tht matter of ac- 
tua^ delivery. 

That is, during the pilot test phas^. the'kinks 
in the deliverv svstem are worked out Mechanisms 
are developed to assure that an individual who is 
supposed to rec-eive an income subsidy does in- 
deed receive it and no othej A verification svstem 
IS set up to assure that stu^lents who are supposed 
to participate in an activity do indeed do so. and so 
on This basic requirement that one establish pxo- 
cedures for monitoring delivery seems trivial. But 
in fact It IS not always a simple matter The New 
Jersey Negative Income Tax Experiments gener- 
ated grand-jury hearing when it was discovered 
by journalists that, unbeknownst to the experimen- 
ters, some treatment gpoup subjects w^re receiving 
muhiple subsidy paymems to which thewvere not' 
entitled ' [ - 

A second chronic problem concerns. the indi- 
vidual's willingness or ajttentiv^ness in receipt of. 
treatment when assigned to the treatment condi- 
tion. For example, in the Kaiser Permanenie ex- 
perimental Jests of multiphasic screening, many of 
the ipdividuals assigned to the free screening pro- 
gram failed to tlirn up for their periodic examine'-, 
tion Th^Ka^ser staff, interested iiLJhe preventive 
benefits^f Screening and' not in^rn out rate, 
mountecma intensive effort to encourage partici- 
pants to^me for screening. The battep of tele- 



ERJC 



/ 



^ Q^hone operators who furnished oral reminder 



jacked up the participanurate to a stable 65^ for 
. the ten-year period of t!hfc experiment. A similar 
encouragement strat^y was developed during the 
course of exf>eriments to evaluate the children's 
television program ''Sesame Street." 

Hefe, the encouragement strategies were d'e- 
t^eloped on line, i.e. during the condxict trf the 
main experiment. It's likely that at least some 
jDroblems could have been reduced earlier-througlT 
pilot tests. * 

4.4 Randomized Assignment and Maintenance of 
Condition . • ^ ' ^ 

^he preceding section dealt with maintainirTg 
a regimen, and here we consider both that mainte- 
nance and the assignment process. The idea^of the 
• pilot test in thrs instance is ta anticipate and ac- 
commodate problems which ^^e expect will other- 
wise arise in the mam test, to deveWp some ide^s 
about the problem's severity, and lo develop and 
test strategies for accommodating tbe prqblems 

The pilot test looks at the questipn "How can 
randamiz/?d assignment be accomplished best'^" 
and proceeds to examine tactics for enhancing 
feasibilitv of randomized assignment in subsequent 
mam field tests So, Yor example, the Di^ Hfart 
Feasibilitv* Studv helped to determine if indeed 
randomized assignment of individuals to alterna- 
t^e cholesterol reducing /diets was mariageriall) 
possible, ethically acceptable, and sociall) innocu- 
ous. In a more elaborate pilot test, various public 
arguments for randomization migHt be tried out, 
vaffoqs mechanical techniques fof achievmg ran- 
domization unobtrusively might-fer'tested, and var- 
ious svstems for controlling the inevitable lapses in 
randomization might be examined 

Maintaining individuals, once assigned, in the 
alternative levels of treatment, or in alternative 
treatment regimens, or in\he control 'condition if 
there is one is important, of course And in the 
absence of anv prior information ajDoiTt alternative 
methods of doing so effectiv^ly.^'a pilot u?st of a 
chosen approach seems prudent. For the treat»^ 
ment conditions, systematic ehcouragemept and 
reminders for an effective tool and ttfbr^rtft or 
worthlessness should be evident in a 
members of a no-treatment control condition, "ad- > 
ditional incentives for participating in the experi- 
ment may be wari^anted (see rerfiarks below). 
Those. ma> be tan^hle or intajigible. but in either 
case, their usefuln^^t ought to be established be- 
fore the mam expetijnent is put into the field. 

4.5 Summa^-y ' y 

*ro sunvmarize, the most direct way to establish 
the feasibility of ai^rge field experiment is to 
mount a, pilot field experiment. That smaller test 
can help one to identify unexpected 'problems, to 
try out solutions to the prpbiems- we know are 
chromic, and to accum^ula^i* information which is 



often essential teethe quality of a major field test. 
With very novel programs whose C'haracter is not 
well understood by the' public. whSse target popu- 
lation is difficult to reach, whose* effects may be 
subtle and virtually undetectable using off-the- 
shelf measurement devices, such a pilot test is es- 
sential. • , 

With programs backed by intensive lo^ngfer 
term research on target populations, on response 
variables, and so.of^ the pilot test is less crucial. It 
becomes considerably less crucial when the exper- 
in^enter already knows a good deal about mount- 
ing very high-quality field surveys in general, and 
field -experiments ip particular. * ' 

A pilot test may ils^f not be practicaJ when 
time IS short, resources are slender, and a conser- 
vative approach is not warranted. In that case, one 
^an only t#y to work oirt- tentative solutions for 
some of the probletns we've identified and be 
ready to impjove them during the niain experi- 
ment if they prove inadequate ^ ^ 

5. Direct Constraints on Feasibility of 
V . Randomized Tests 

The^re are a variety of difficulties which can be 
anticipated to assess feasibilitv of an experimeni. 
Both the difficulties and some tactics which can be 
used to resolve them are discussed in the following 
remarks^ Since both irrelevant factors, i.e. red her- 
rings, and pertinent factors may influence judg- 
ments about feasibility, so both kinds are discussed 
here. * » 

5*1 Randon^ization and Selection. 

Basic misconceptions about randomized ex.- 
periments can affecv judgements about ifieir feasi- 
bility. We consider one such misconception he^e in 
part ^cause it emerges almost invariably in dis- 
cussioi^ with lay audiences about whether an ex- 
periment can or should be done. 

■ The misconception, concerns ^the ide^ that 
treatment groufj members be sdected randomly 
from a prescribed p.opuls^Bi. This is often impos- 
^ sible, especially where inmvidifals'must volunteer 
for the progranri, and so owe must reach the 
judgement that a randomized experiment is irfi- 
possible 

Now strictly speakiog. randomization ih an 
experiment refers to the assignment of individuals 
from a pool of eligible candidates to program varia- 
tions or alternatives. Virtually nothing about how 
the. initial 'pool of candidate* was actually con-^ 
structed need be implied. 

For examj^le, candidates who apply for admis- 
sion to a manf)ower traitiing program ^necessarily 
include only those individuals who have he^d 
about the program; many have low salaries and 
poor stills, which given them some incentive to 
apply for admission. The resultant pool of appli- 
cants will not ordinarily be represcfntative of the 



102 



total population of people eligible for manpower 
trailing Nonetheless, we can still conduct a 
legitimate experiment, randomh assigning appli- 
cants to training variations, m order to compare 

. the telati^ effects of those variat/ons It is the 
random assignment process which is crucial to the 
unbiased estimation of relatne effects on the can- * 
didates at hand.jX^^ is not to sa), however, that v 
the prdtess of constructing the pool of can'didates 
for ah experimental testes unimportant. Indeed, .it 
IS important in that it determines how genetaliza- 
ble th'e experimental results must be Suppo'se, for » 

' exam|>ie, th^t onK e^rlv applic4nts foV a training 
program constituted the basic pool of candlfeltes ^ 
Aftef randomlv assigning membej-s of* the pool 'to 
program variations, we^might find that one par- 

♦ticular variant of the program, s^\ skill training / 

•and general education, was more effective than 
skill training alone in increasing job opportunities. 
It IS quite possibli* that this result is nQt gt^neraliz-- ^ 
able to late applicants to the program, although it 
IS legitimate with i;espect to earK applicants. Those"* 
who apph late ma\ be delaved bv the+r ir^bilitv to 
read or to monitor governmental services, or for 
other reasons, and tbev mav profit greativ from 
general education com^nents added to their skills 
training, \faking generali;^t*ions ab(Uit the pro- 
gram's impact on groups not represented in the 
experiment can be hazardous for this and other 
reasons So some ex[jerimental test* involve nci| 
onlv random assignment of individuals to'program 
variants but random selections of individuals frt^m 
a populaUon of eligible candidates, as well. Ran- 
domized* selection, of course, is not the rtnlv dc- 
teriliinant oF generalizabilitv m evaluations^ ex- 
perimental or otherwise Others are examined 
br^eflv below 

5.2 Randomized Assignment to Control: lifting 
Treatment Variations t 

-One of the most frcquentiv mentioned obsta- 
cles to the conduct of randomized tests concerns 
the randtJm assignment of individuals to treatment 
or control conditions There are at least four issues 
implicit to arguments about this matter, and we 
consider ea(?h in turn The first is a matter of de- 
sign of tj^e evaluation and involves" a^ shift in the 
question which the experiment's supposed to an- 
sv\er Thi^ option is considered here, and other * 
options which may be taken to determine or more 
directly enhance feasibility are discussed vn the 
next three sections. ' » 

It i-s clear that in some cases, it will be illegal, 
unethical, or otherwise imprudent to assign scime ' 
members gf a target sample to a "coritroP' (no- 
trcatment) condition* .Nonetheless, it is still possi- 
ble to conduct randomized experimental tests 
without losing si]ght of the basic aim: to under- 
star\d th'e nature of program effects. vSpecifically. 
we can compare the relative effectiveness of pro- 

Mc: . . . ■ 103 



gram mrtations* using randomized te^s w;her-e it is 
important <o determine ij some of those variationsf> 
are more effective than otlitrs.* ^ 

It may in any effect make more scientific sense 
*to test variations. One would often Jike to know' 
how response varies with different leyels of imen< 
sit) or elaborateness of treatment, ^ot merely what 
th6 effect is at one level: One would ofxeh like, to 
know ^hether a more expensive program or pro- 
gram component is that much more effective than 
a cheap and Qifferent pr6gram or component 
which IS advertiseH to ha^ve. roughly the ^ame ef- 
'fect 'In the latter case, the economk: justifkiation 
for tesnng variations is also clear. ' ^ 

To be mof^ Specific, co/isider a special p>olice ' 
training program design-ed to reduce assaults on 
police. It may be funded well' jpnough tq^accom- 
modate all .eligi^ble candidates. Under this condi- 
tion, policemen who are faqdomly allocated to a 
no-program (coRtroU condition may^object ih iheir 
assignment and, resist partiGipaCmg in an experi- 
ment. Managerial interest vn and logistical support 
for a control condttion mav not be available for a 
variety of reasonOdespite the fact that it is not at 
all clear that the program itseli will beteffecti\e 
To deal vyth these problems, U mav be possible to 
test several program" variations against ©ne ^ 
another, or>o test expensive elements of the pro- 
gram against one another rather^than to try to te^t* 
tfhe complete program agamkt control conditions. 
This strategy will at least provide an unbiased es-* 
timate of the relative impact of important training * 
variatic:)ns (in reducing 'assaults, say) And if the 
experiment e^^amiges expensive program ele- 
ments, we will be able to determine which of those 
ekments are least useful in reducing assaults. Not 
using a control condition forfeits the option of estimating 
program effects on assault relative, to no program at all 
But the option itself jnay be useless in the sense that '"no 
program'' is not a politically feasible alternative 

The comparison of program variations nted 
not be justified solely oit grounds that control 
group members may feel deprived. There are im- 
portant ethical reasons for using a variatrons.de- 
sign which are discu ssed. belo w. (see Ethica I 
Grounds for Criticism). And there-are ^till'other 
cases in which comparisons among botH variations 
and the control condition are warranted. For 
example, in evaluating the impact of a Manpower 
Development and Training Act program in Vir- 
ginia, Brazziel (1967) suggested tRat. because the 
vocational program tould not accommodate all eli- 
gible candidates, the candidates be randomly a**'- 
signed ^oniy to program and no-program condi- 
tions. In addition, however, he did take the oppor- 
tunity to develop a majoi' program variation — 
general education p^is vocational irauQjng — 
against which the regular program could hc com- 
pared. Eligible -candidates were then assigned to 
one of three conditions: vocational training. gejiP* 



eral ^ducation*plus vocational training, and a con- 
;rol ronHirion In vhe event gf failure of the voca- 
tional program versus no-program comparison, a 
comparison of the program variations 'would stilj^ 
be useful to deterjnine if the pro-am variatidte 
(general education plus vocational training) leads 
to trainees who are better equipped to adapt to 
different job requirement* than thpse who receive 
vocational education alone. 

5.3 Randomization and Differential Effects of 
Treatmcfot 

One of the special^ constraints on many pro- 
gram^ evaluations is that different types.of people 
may be in need of treatment, and effectiveness of 
treatment may vary with gfrson type. Accom- 
modat^ing that constraint is not cUfficqlt, provided 
that the person type can bo^cq^rately identirie(;l. 
Two cases are consfdered below. In the first, we 
focus on experiments which reveal whether indeed 
there is an interaction between person type and 
treatment type. In the next section, we focus on. 
the case in which randomization and need for what 
IS believed to be effective treatment ar6 at issue. 

• Even a curspry investigation of textbooks on 
experimental design reveals strategies which can 
be used routinely to determine how different types 
of people are affected differentially by a program. 
Given the?*e general designs, it is up to the 
^evaluaPor and the program qeveloper to speculate 
on what attributes of peoplt might interact with 
the progranrt's effec-ts and to decide upon a reliable 
way of discovering whether people have those At- 
tributes. The speculation may be based on anec- 
dotal information ^s well as more structured 
judgments of the informed program developer. 
And iPone^an measure those attnbutes well be- 
fore the experiment, they gan be incorporated into 
a randomi'zed block desi'gn which will pejmit us to 
deject the interaction wh^/i it occurs. 

Such designs have' often been used by sophis- 
ticated analysts. Results of some California Youth 
Authonly experiments, for example, suggest that * 
delinquent boys who are socially assertive do have 
the capacity to work in and benefit fA)n^ confron- 
tive group treatment, while boys sens^itive to 
threats fare betrer under more sHpportive treat-' 
ments Which de-emphasize confrontive, prohjing 
b<fiayiof (Knight, 1970). At a cruder level, the 
Health Insurance Experiment mounted program* 
in different sites to assure, that if effects of insur- 
ance (say) on health services ijtilization vary with 
local access to Health Maintenance Organizations, 
or with site-to-site differences in use of health 
services, the experiment will detect those interac- 
tions. Good experimental tests *oP clinical treat- ^ 
,m^nts regularly incorporate qualitative char^c- 
leristics of clients into designs not only to detect 
differential effects of treatnlent but also to antici- 
pate problenxj in field implementation of the pro- 



gram. The Cohen and Krause5(1971) experiments 
on therapy for wives,, of alcoholics, for example-, 
deliberately included demographic variables to^fc- 
commodate the known tendency of clients from 
uppAr socioeconomic classes to ^seek and begm 
treatment more quickly than individuals from the 
lower-income brackets, to be more accessib^ to 
prograrh staff, to^be^ more easily engage^ in treat- 
ment, and so forth. 

By ignoring th^ possibility of such interac- 
tions, of course, we run the risk of not ?i^cting 
the pro|[ram's 'main effects. One mi^t find, for 
example, that there is no difference between, two 
programs, when in fact one program affects type 
A individuals dramatically' in one direction while 
the second progranj afects. them equally in the op- 
posite direction. Conversely, we also run the risk 
of adopting a program for general use (on the 
basis of large avefage effects) when in fa(!t the ef- 
fects differ consider^ly, depending on charac- 
teristics of particular] Subgroups in the target 
population. 

5.4 J<andomization and Need for Treatment 

The preceding section focused on changing 
the character of the treatment and the evaluation 
question to a<Ammodate the problem of resis- 
tance tof randomiJied assignment to treatment and 
control condition. Here the fDcus is also on chang- 
ing the evaluation design, but alterauftn is made to 
screening tests for the target population rather 
than the program. The objective is the same: to 
avoid or attenuate a possible local constraint on 
randomization,' and s6 to enhance feasil^ility of an 
e^perim^t. \ 

Randomization is most appropriate when the 
effect of the treatment variation on the sample at 
hand is unknown. We recognize* that the effect is 
unknown from the judgements of experts. They 
regard the evidence as equivocal and, in the ab,- 
sence of ^ny other information, so usually must 
we. Now this immediately suggests that as a gen- 
eral strategy in identifying the target population to 
which the program is most* relevant, ohe ought to 
classify possible recipients into three classes: those 
who, most experts would agree, need the program; 
those whose need is debatable or ambiguous; and 
those who clearly do not need it at all. It \^ the 
middle group which is fnost perynent to ran- 
domized assignment, there being no other rational 
basi^ for providing treatment. 

The best example which we have been able to 
find to illustrate this perspective is the British 
Myocardial .Infarction study, mounted to deter- 
mine whether home care or hospital care is a bet-s 
ter vehicle for treatment of a certaiw class of heart 
attack victims. The serious condition of some pa- 
tients, physicians said, clearly- warranted 
intermediate-term hospital care; for others, such 
care was very likely to be a waste of time. The gray 

04 . ' ^ 



area of need included patfents for whjom a c^nfi- 

* cjent judgement could* not. be made, and it ^as 
meri>bers of this group who were assigned ran- 

^do*mly to home or hospital care in the ex*periment. 
The group had until theri.atmost invariably gone 

*to hospital rather ^ha(n home* since hospit^izatirfn 
costs-*vere paid, physijcians had been very conser- 
vative in their judgements, and for other reasons* 
The experiment, carried out successfully, was use- 
ful m obtaining evidence that home treatment was 
no less effective than hospital, and in obtaining 
data useful for economic planning and manage- 
ment of a broadef^ed home care systerfi. 

An experiment of this type tells one virtually 
nothing about the impact of the program on those^ 
who are said to be reall) needy. But it does do so 
for the ubiquitous marginal group. If the experi-. 
ment is informative for thi» group, then the s^me 
theorv^'fnight be extended to an adjacent group, 
said to be need), but n6\\ constituting a new mar- 
ginal group, to s^ if the treatment has some im- 
pact on them. 

• Refnarks P , . ' 

Even \sn\\ initial prior agreement by expert 
judges to )4bel the marginallv needy, tht actual 
experiment mavYail because thejudges, on secon^ 
thought, mav find they can really .assign very few 
to the marginal group*? 

* This appears to have occurred in judicial ex- 
periments, where p#or judicial agreements to 
label those for whom a sentence is qujte arbitrary 

# were abandoned during, the course of the re- 
search. 'They appear to, have occurred in e\peri-^ 
mental 'tests of parent effectiveness training where 
the agreement was subverted by ^staffs with a 
stnoug \ested interest in the outcome of the exper- 

* iment And it has occurred pisewhere The prob- 
lems^and potential solutions in these instances 
might be better identified in a pilot field test 
rather than in a large-scale effort. 

5.5 Cost of Randomiztd. Experiments V 

We often hear the cl^im that experiments are 
^ t:at her expensive and time consuming. Yet the de- 
. > tailed costs oT most program evaluatio'ns, experi- 
( mental or not, are often poorly (Jocumented,* 

* su^gesnng that contentions about expense cannot 
be edsily verified. The data necessary td permit a 

, fair comparison between, say, a randomized test 
and a very well thought - and q u«g i i - 

' experimental te^ are simply urffyailable. Tq te 
sure, some evafuators have laid out the tosts of 

^devaluation well (e.g., in the Taiwan Fertility Cor\-- 
trol research), but most have not. Mere generally, 
there exi^t tto special accounting conventions for 
costs of program evaluation and no coherent body 
of stafjsticafdata on costs. The National Instkute 
of Education, in fact, has had to develop special 

' • contracts to lay the groundwork for good Iccount- 
/mff practices for documenting the costs of the ^x- 

ERIC . -^0 



5 



perimenta^qvaluation of the Career Education 
' programs which it suppgrts."'^'^ 

Th/ only hard comparative data of which I, am 
aware, beari;ig ^n the costs of experiments versus 
other methods of impact-e^^aluatFon, stem from the 
NIE effort. Randomization appears to have re- 
quired mu^ less than a 1% increase in evaluation 
' budgets, the increase being spent on payments tor 
control group members and to experimental 
group ^[lemb/rs in return for their cooperation. 
The, data are based on Experien(fe-based' Career 
Education Programs \Nhich shifted from* their"\ 
plans to conduct nonrandomizedp.assessments ( 

/ (covariance analysis) to* randgmfzed tests lof, th^ir 
programs. . * * . 

If we eXapiine other prejced^ pftp iore closely, 
it becomes obvibus'that not all experimenter} tests 
of social programs need be costly in ab^lute 
terms. Especially in education, the feasibility and' 
utility of small, economical experimental lests pf 
less than a ye^r duration havcwen demcfristrated 
Repeatedly; For example, Goodwin and Sanders 

* (1972) required less than three months to collect 
6vidence on the effectiveness of tape-recorded 

< curriculum units for use on school bus^s; Zener 
and Schnuelle's (1^7^ assessments of alternative 
career ediKation progi»ms for high schools took 
less than 12 months. The \Velch ^nd Walberg 
(1072) experiments on dissemination of teaching 
maienals for ' Project *Physics (Harvard) requited 
less than 1^*2 months and- $30,000 ta complete. 
Other econom'ical experiments in evaluation of 
curriculum and teaching strategies are described 
in Riecken et al. (1913), Gage (1963), and 
elsewhere. 

Experiments especially nt^ed not be costly if 
the treatment is of short duration and if the\ime 
interval between imposition of the program «rTd 
the-3f)bservation of the program recipient's re- 
sponse is ismall. 1^'or example, nn the Manhattan 
&ail Bond Experiment (Borein, 19^), the pro- 
gram consisted of a bail waiver for irWividuals ac- 
cused of having committed certai^n crimes, fol- 
lowed, within a year by observation of a criterion 
variable — failure of the accused to appear for trial.* 
Similarly, experimental evidence regarding effects 
of various vbter registration campaigns was avail- 
able soon after the new campaigns were tried 
j^Gosnell, 1929). The effects of alternatrve com- 
munication strategies are available soon after the 
suftjects' receipt of information; for example, the 
classic wartime propaganda and communications 
research of H%^land, Lunsdaine, and Sheffield 
(1949). In nxarketing and census research, infor- 
mation about the relative effcjctiveness of various 
methods of eliciting and transmitting valid data 
from respondents can be made available routinely 
within ^ix months after survey prbgrams are ini- ^ * 
tiated. 

This is not to say, however, that some experi- 
mental tests have not been expensive and time 



99 



100 



9 



ERIC 



consuming in iibsolut^ tefms. Those programs 
which are expected toiiave Jong-term effects or to 
have efTectronly after ^ long period of treatment 
can betparticlilarly expensive. Staff required for 
evaluation must mair\tained^ and deciflons 
about, whoil^ale ad^tion of the experimental 
program are- delayed until data are obtained and 
analyzed. The ^gative Income Tax Experiment is 
an- expensive (more than $12 million) and long- 
term (6 years) research effort, where tirae is re- 
quired .primarily to "fix" the ex.perimeWtal 'treat- 
ment (i.e., to. get f/eople familiar with ^he welfare 
subsidy) and to detetniine fong-term effeCts'of the 
subsidy. Experimental tests of criminal reform 
programs, of rehabilitation strategies for the men- 
tally ill, and of some education programs are time^ 
consuming, not o*nly i)ecause tTie time necessary 
for treatment can be long, but because it is the 
long-term rather than short-term effects that are 
jnost relevant to program devetepment. • - 

At least with respect to absolute si^e of in- 
, testment^, the requirements of experimental tests 
vary considerably with the. particular developmen- 
tal stage of the program, the adequacy of short- 
term effects as an indicator of program success, 
and the time necessary for completing the treat-' 
ment program. Th^e are at^least two important 
issues^ however, .which suggest/' that,, we cannot be' 
content with decision of absolute cpsts: the costs 
and benefits of lower quafky appraisals, and the 
intermediate products oF experimental evalua- 
tioT^. The cost of not doing an experiment will 
often be high, simply because the data stemming 
from observational studies wi^I^sually be 
equivocal, ^nd the cost of wrong decisions (or no 
decisions) based on equivocal data can be high. 
Unfortunately, there have.b^en few formal 
analyses of the costs to society of not doing evalua- 
tion^ of dojng equivocal evaluations, or of mount- 
ing rigorous tests of social programs. The better 
(and perhaps the only) cost/benefit analyses of ex- 
periments are in the fertility-control area where, 
for example, the Population Council has suc- 
ceeded in obtaining fairly good information on Xhe 
c<^ft and impact of data stemming from its 
fertility-control research. 

On the other hand, there has bltlhti bit more 
progress in identifying the benefits of evaluation 
and of staging research to obtain usable products 
periodically before the experiment's completion. 
Before program effects appear, the experiment 
often provides better information about the^ pro- 
gram*s*target ^oup than was previously available. 
Such baseline data often yield raore accurate 
characterizations of the target group* than were 
available^ at program inception, and consequently 
may be helpful in designing and launching sub- 
sequent programs. See Field and Orr (1975) for 
remarks on this in th6 context of the "Housing Al- 
lowance Experiments and the Negative fncame 
Tax Experiments. 



5.6 .Accommodating Ethical Constr^nts on an 
^xperin^ent 

Claims about 'th'e ethical aspects of randomiza- - 
tion generally take several related fofms. The con- 
tention tiiat control (fio-pFpgram) group members 
lare deprived of a program wbic^h might be benefi- 
cial to them occurs often. A miq-^ fmage of this 
complaint is that the program recipient is sbb- 
jected to risk by his participation because a novel 
program may have unpredictable negative effects. » 
' A second broa^ class of criticisms concern manipu- 
'lafion of human beings — an activity which may be 
objectionable in prinqple. A related 'issue cQnc^rns 
the notion cllart the research subject is being 
exploited regardless of the costs and benefits of 
the experiment; that is, that he receives little in the 
"^way of dir^ifct reward for his participation and lacks 
even a guarantej^that the information, he provides 
will nbt be used improperly. ^ ' 

Some exp^imehts can be judged to be unethi- 
cal for these reasons. But this does not imply that 
all experiments are unethical, any more than one 
hi^-quality experiment implies thaO all are of 
high quafity. The following remarks capitalize on 
Xv^hat we already know about fairly universal if 
cruHe ethical* standards and about potential con- 
flicts between tJjftse standards^nd experimenta- 
tion.' They focus on thi question of how to design 
the'expqriment within the* framework set by gj/od 
ethical standards. . T 

-failure to experiment' as unethical. A fret^u^nt 
claim about randomized e^eriments is that some 
members of the social firogram's target 
peculation — the control group members — must be 
deprived (randomly) of a benefit. The claim as- 
sumes, of course,"that the treatment is actually be- 
neficial, and if it is known to be beneficial, then the 
experiment may- well be unethical. But the aim of 
most experiments is to discover whether there is a 
detectable program effect; we may not need an 
experiment at all iP the impact is already under- 
stood. By restricting randomization to programs 
about which we are in doubt, we avbid the ethical 
dilemma (or accusation) of depriving an individual 
of a benefit^ There can be no benefit if the pro- 
gram is 4|seless and o||t^n we cannot show if it is 
useful without an experiment. 

A relate^ line of argurj^ent here is that a fail- 
ure to discover whether a program is effective is 
^unethical. That 45, if one relies solely on ndhran- 
domized assessments* to mah^judgments about* the 
efficacy of a program, subsequent decisions may 
be entirely inappropriate. Insofar as' a failure to 
obtain unequivocal data on effects leads to deci-- 
sions whicJv are wrong and ultimately damagiag, 
that failure may violate good standards of both so- 
cial and. professional ethics (I^utstein, 1969). Even 
if the decisions are "correft*' in'the sense of coin- 
ciding with those one might make based on ran- 
domized experimental data, ethical problems per- 



/ 



sist. The rigHt action taken for the wrong reasons 
is not especially attractive if we are <o learn any- 
*\hing about how to effectively handle the child 
abuser, the chronically ill, thepo'orly trained, and 
so forth. ' 

Design^of Ethical Experiments 

There will* always be case^ in which the use of 
. a Hp-program control condition corrforms readily 
With professional ethics. That is, there is agree- 
ment that program effectiveness is ambigucJus, 
that the available data are insufficient for making . 
a judgment a^out its quality, and hence an exper- 
iment IS ethically justified. But a public or stand- 
ard may deviate from this notably; and it may be- 
^ come necessary to adopt' some strategy for either 
altering that public ethic ©r adjusting the cfesign to 
accommodate it. . - 

Chtinging a public ethic is usually impossible 
with the time available to mc^Mnf* experiments. 
• Nonetheless, 'some short-term approaches have 
been tested Som'e rely heavily oji.the use of the 
m^dia to enhance the reading public's understand- 
ing of the process. (TH at some journalists and y^t^ 
ence writers can eftectively translate the matter 
into la\ terms is readil) evident from artttlfes by 
Alan Otter in the Wall Street Journal, KoutaleWin 
the Chicago Tribune, P.C.^Gilmore in the .V^'u^ York 
Times, and elsewhere'. Alice Rivlin has written spt- 
cral articles on the Negative Income Tax Ekperi- 
ment for the Washington Post and New York^ Times, 
other social scientists have done for other press- 
.es. , \ 

\ More direct actio^ is usually warranted, in- 
cluding the constriKtioq of unobtrusive but effec- 
tive schemes for randomization, and fair s^s of in- 
. structions to assure that informed consent re- 
quirements for nm-ticipants are met. This area . 
dort not seem to wave received much in the way of ' 
systematic researc\and development. THe whole 
matter of encouraging participation in "an experi- 
ment IS still a v^ry ill-documented aija. The little 
systematic research we*ve seen suggests that people 
will find randomi/anon more palatable if they are 
parly to the randomization process: they pick the 
lottery number themselves raiher than havfrng ^ 
' someone else do it. They will find \i more palatable 
if ev^ bej^ng^ a member of a control group affords 
some benefit (see Section 5.* and remarks below). 
They will find it more palatable if there are intan- 
^bl^ benefits, such as increased self-esteem or de- 
creased anxiety ot loneliness or boredom, by par- 
ticipating. , • * / 
There will also be cases in which a randonmed 
test of a program versus. amo-program control 
(onditi(^n i^ unethical. Th^at V^particular experi- * 
mental design is ethically unacceptable in a par^cu-^ 
. jar evaluation of course^ implies nothijig about tHe^ 
acceptability of other randomized designs. In fact 
,a variety of techniques have been developed to.re- 

ERJC - f . 



duce or elinlinate conflicts betveen ethicaJ stand- 
ards and evaluation needs. . ^ • 

One ^^^J#^ous device is to stage the introdYictfoxi 
of treatment so that one merely delays treatment 
for individuals in. the randomized control group. 
The strategy MS. sometimes essential in any event 
because many programs canned accommodate all 
eligible candidates immediately, and staged ac- 

' ceptance of candidates is managerially justified. 
The. control groups may subsequently be reduced 
incrementalJy or all at'ohce, so long 3*5 the delay is 

• sufficient to permit useful comparisor^a bel?ween 
program participants and nonparticipants. (See 
Chapter IV of Riecken et'al., 1974, for more de- 
tailed-description of this design and its Jim- 
itations.) - ^ 

"Playiog the winner" is ^ related 'stratdgy, used 

jnore often in' bio-medical rrtearch, to estimate 
program effects with minimal depriv^ion to 

" members of the less effectively treated group. 
Here, subgroups of /:andldates ^or individuals are 

. assigned to a program only as long as the outcome 
oj^reatment is siiccessful. When » failure occurs, 
tn* vefy >Te\i. subgroup^X)r iAlividual u assigned <o 
the control (or alternative treatment) condition. 
Subgroups continue to be ^assigned to the control 
group so long as no failure occurs. When it doe^, 
the very next sub^ovip is assigned- *to?the first 
treatment. And 50 pn. This strategy is a* more 
complex one, but recen^nalytic work shows that it 
cap be very effective when success ok failure be- 
come evident quickly and when switches can be ac- 
complished easily (see, e.g., Fushimi, 1973). The 
strategy also i:equires that the "success" be readily 
identified when it occurs, a demand which may be 
difficult though not impossible to meet in some- 
settings. 

If^elays in program participation are ethically 
unacceptable and if program installation involves 
no naturally occurring delays, therjot^er strategies 
can be use«^ Rather th^in thiak solely in terms of 
treated versus untreated program candidates, for 
example, it is often reasonable to change the re- 
search question slightly to permit us to think about 
comparing treatment variations,' an option already 
discussed in Section 5.2 above. Candidates for the 
social program. can be allocated randomly to dif- 
fereqt levete^f treatment, the lowest level being'a 
minimal ethically* acceptable offering. -This idea 
has been used in both 'the Negative Income Tax 
Experi^ifnt and Health Insurance Experiment, 
where deprivation of economic benefits relative to 
curfent social standards would be^thically uncon- 
scionable despite its importance as an economic 
question. And it has been used in critical medical 
studies such as Rut;5tein's (1969) tests of -cortisone 
against aspirin in, treatment of rheumatic fever. In 
these and other cases, new programs are compared 
against the better conventional ones rather than 
Against no pro^jn at all in order to satisfy both 
Y:i&ntific and etWcal Aandards. ^ 

07 ' ' 



Similarly, experimental assessments of fompo- 
nfnti of' a program rather than the total pVftgramV 

I may also be possible' when there is1iti;le prior evi- 
dence xm effects pfy Ae)pcomponents, but there is 
strong professignial or 'societal belief that the pro-, 
grain^is indjeed effticriy^r ^ysiciansj, lor example, 

^ are -oftenvcoiafiJieHt lhat ^ntpgrated health-care sys- ' 
tenjS art; good ahd that if . t^atal" health is de- 

tliyef.cf4NXd ^^ihdiv4dtial, .h^ health will Jmprove; 
Under tiiesf i'^n^i^jio^St may be impossible to 

rT^K^unt af^r test'fef jriie'total progra'm. Instead, 
evaluatbr might look for, those components of the 
program about which there is some doubt as to 
their effectiveness. For example^ integrated 
health«care delivery systems. in lesser d^^efbped 
countries are bemg supported by the U.S. Agency 
for International Developrpent* To the extent that^ 
nutrition, hfcalth care information, and the like are 
regarded a priori as **a,good thing," trying to 
evaluate theii^total Effect using a randomized ex* 
periment may be a pointless exercise at this time. 
Component-wise evaluation is^not. No one knows, 
for example, how paj^medics should be chosen 
(rponks,* midwives, relro^, or vi+lage elders) and 
trained to yield high trea^^ent rates with minimal 
'cultural disruption. The situation pr6«ents us \vith 
an op po rtunity to e>periment with alternative re- 
cruitment and training strategies even if we dp not 
obtain unequivocal data on the actual product de- 
livered by the trainees. 

Often, criteria such as merit or need are ius- 
* • ... 
tified on ethical grounds for assigning individuals 

to program;s whose effects are not well 

- documented. And the meritocratic criteria lead 
some critics to conclude, that randomization is 
therefore impossible on epical as well as marlage- 

' rial grounds^ HoWever, we can still obtain evidence 
based on rando]?[iized <ests it we capitalize on so- 
called * regressri on-disconti nuity designs 
(Thistletliwaite & Ca^bell,\l96P). In the simpJest 
tase, one ordei^ all program candidates on the 
basis df need, then assigns all obviously deserving, 
candidates to, the program and all the ^obviously 
undeserving to the toArol condition. Individuals 
in the ubiquitous marginal ^roiip are assigned 
randomly to program and control conditions; their 
margin^lity implies that no reli^t^le judgement can 
be ipade about'^he. extent u>»which they merjt the 
program. A variant on ims design has b^en used 
successfully in the British myoc^dial infarclion , 
studies, where marginally ill indij^iduals were*ran- 
domly assigned to borne or to hospital care to 
sati?f]^thical standards^nd to^iscpver whether 
hospital care resulted ip any notable improve^ 

' ments in their -health.s 

. Demands on the research participant. One of the < 

Asinlpl^st ethics-based criticisms of randomized ex- 
periments is^ that regardless o( the scientific and 
social ^benefits of the experiment, it is a ^istin^t- ^ 

* imposition on thje|research participant. Exactly the 
saAne criticism, of course, can be leveled against 



sqrvey research iH^any^sort an(J against qtmsi- ' 
experimental and other fypes of evaluative re-- 
search^ The research participant does indeejl pro- 
*. vide a service to rhe researcher*— information 
about himself, his time, energy, and courtesy in 
providing the 'mforipation, and Jo forth. And in- 
'sofar as the social scientist profits (at least intellec- 
tually) friom the inforWation' he receives, why 
should not* the provider also profit? The tewfirds 
*t<> be .sure need not always b^ Targe or even tangi- 
ble. For exart^ple, there is some evidence for'the 
contention that in certain types of tesearch, the 
interviewer's behavior, conversatio!\, and discus- 
sion* of research do constitute a temporarily re- 
warding experience for interviewees.- If higher^ 
demands are made of research participants, they 
ma,y*be entitled to mpre tangible rewards for their 
coop^atrdn. Students who participate ih experi- 
mental tests of NIE-supported Career Education 
Pr(^grams, for instance, are paid U^r providing 
.their opinions, reactions, for taking tests, etc., re-» 
gardless of whether tliey A^ere' assigned to 'tl\€ ex- 
'perimental program or to a control condition (thd 
conventional high-schpol programs in cyeer edu-' 
» cation). What the fiature of the rejyard should be 
in different^t/pes of experipiemsand how alterna- 
tive rewards affect the integrify of .the experiment 
need" more empirical investigatiofi, however, '^'he 
lijtle data Mailable on this topflc sfem primarily 
from survey research where* alternative rewards 
have oftfn^been tested experimentally to 5eter-' 
mirte. how rewards such as money payments, spnall 
gifts, etc. stimulate cooperation (e.g., the Sudman 
& Ferber (1971) wofk «n strategies ^^impfovLng 
response rate in consumer surveys). ^ ^ 

Confi4errtiality of information. The problem of 
assuring cotifidentiality of data is not confined tg 
experimental research but appears in survey re- 
^'search as well. But the problem^iias been highligh- 
ted by the Negative Income Tax' 'Experiment, in 
which a .county prosecutor forced economic re- 
searcbers to yield research ^ecqrds on identified 
subsidy recipients (research subjects). Th^case is a 
regrettable illustratron that the researcher may t>e 
cast unwillingly into the role of irtformant/ if.he 
does not anticipate .the possibility of judicial or 
legislative appropriation 0f his records for . prose- 
cuting some of his r^search^ subjects. There have 
been som^ advances in resolving this a'ftd related 
conflicts. For example," procedural and** statistical 
devices have been created to assure confidentiality 
of respondents' reports without undermining re- 
search goals (Boruch, 1974). Special forms of tes- 
timonial privilege for social researchers are being^ 
constructed to supplant or complement technical 
' . devices for^s^uring that research records are used 
only for research piirposes (see Reicken et aL, 
1974; Boruch, 1976, and r/efereAces therein). 
These approaches are imperfect, but they are 
being field tested, and they do help to reduce con- 
flict between l^g^l (Jemafids for individual record? 



and the social scientist's ethtcal requirefnent 
confiflentiality of records on his resporrd|nts. 



7^ 



Cook, T.ET., 8c Campbell, ^.T. The designf ahd 
conduct of quasi-expjriments and true experi- 
ments in field settings. In M.D. Dunriette (cd.). 
flandbdok of industriak'and organizatiojial psychology. 
Chicago: Rand McNally, 1975. Pp: 223-324.' ^ , 

Daniels, D.N.,*et al. DANN services program (Re- , 
search teport, Nadonal Institute of Mental Health; 
Grant No.' 02332). January 1968. \ . 

" DenistdH,^ O.L., 8c Rosenstock, I.M. The validity of 
^ desigVfS for evaluating health services. Research re- 
port, Ann Arbor: School of Public Health, Univer- 
sity of Michigan, March 1972. ^ ' - 

Field, C.G., 8c Orr^ L.L. Organizations for social 
experimentation. In R.T. Boruch' and H.W. Riec- 
ken \MsJjidft|ttmfn<^/ testing of public policy: The 

G€ed^^^K^^^m74 Social Science Research Council , 
'ConfWt^Kj^^fSeial Expenyienis. Boulder, Col- 
oi^o:- \^stview Press, 1975. * / 

Fushimi, 'M. An improved versTon of the SobeU 
Weiss play-the-winner procedure for. sefbcting the 
better of two binomial popula^6ns. Biometrika, ' 
197,3, 6<?(3), 517-523. 

Gage, N.L^(Ed.)'. Handbook of research vn teaching. 
Chicago: RJhd-jVfcfij^afly, 1963. • 

'Goodwin, W.L., & Sanders, J.R. The use of expert- 
mental ahd quasi- experimental desigrj^n educational 
evqluation. Research report. BbuloW! Laboratory 
of Educational .R^^earch, University of Colorado, . 
1972.' 




fleferences 

(► *■ * * 

* • * Berhllf'ein,. I., 8c freeman, H.E. Academic and en-. 
* tre'preneurial research, yiew York: Russell Sage, 
1975. . . >j ^ 

Borftch, '^Y.' Assuring confidentiality in social re- 
search. Book manuscript in preparation. Evanston, 
Northwestern University, Psychology De- 
partment, 1976. . 

Bdruch, R.F. Bibliography: Illustrative r^p- 
ddlnized fielft experiments for pr(%ram planning 
and evaluation. Evaluation, 1974, '2, 83-87.. ' 

Boruch^ R.F. On common contentions about ran- 
' ^ domized field experiments. In R.F. Boruch an*d 
H.W. Riecken (Eds.), Experimental testing of public 
policy. Boulder, Colorado: Westview Press, 1975. 
. Pp. l07-]45. 

Boruch, R.F., Magidson, J., & Davis^, S. Interim re- 
port: ^Secondary analysis of Project Middle start. Paper 
presented'^at the annual meetings of the American 
Psychological Association, September 1975. 

Boruch,' R.F., Wortman, P.M., &: DeGr-a<:ie, J.S. 
Executive summary: Project on Secondary Analysis. Br e- 
sented at the Annual Meeting ,of the Arnei|tan 
Vocational. Research Association, WashingtorV, , 
^ D.C., April irf974- . ' ^ 

Bolfein, B. The Manhattan Bail Bond Experiment.' 
* Tej^as Law Review, 1 96^, 3 1 9-33 1 . ^ i 

^ BVazziel, .W.F. Effects of general^education in 
' manpower programs., fournal of Human ResouYces, 
1966, i, 39-44. ' ^ 

J Campbell, D.T., 8c Boruch,. R.F. Making the case 
for randomized assignment to treatments by cetn- 
sidering the a I te relatives: Six-ways in which qu^si- 
' experimental* evaluations in compensatory educa- 
tion tend to underestimate effects. In C.A. Ben- ' 
nett andT\. Lumsdaine (Eds.), Central issues in so- 
cial program fvaluatio'xi* New York: Academic Press, 
'\ 1975. * . 

' Campbell, IX/T., 8c Stanley, J.C. Experimental and 
K cfUQ^si- experimental designs for research. QJ?icag04« 
. T^affd McNally, 1966. 

Clarke, R. V. G., &. Cor'nish, D.B. The^contMled 
I S trial in institutional research. London: ^ome Office 
Research Studies, 1972. ' . 

Cohen^ P.C., &: Krause, M.S. (Eds.). Casfwork with " 
^ , wives of alcoholics. New. York: Family Services As- 
^ociatfbn of America, 197K. 

^ Conner, R.F. A methodological analysis of 12 tfue ex- 
penmetHal program evaluations. ^'h.D. dissertation. 
Psychology Department, Northwester^ University, 
*«vanston, Illinois, 1974. -t _ 



t 

Gosnell, H.F. Gettingwut the vote. Chicago: Univer- 
sity of Chicago ^ress, 1927. 

H^ber,/(|^., et al. RehabilitMon of families at risk for 
mental retard^twn. Madison: University of^jViscorr- 
sfn Rehabilitation Research and Traijiing Center, 
1&72. K ^ 

Hendricks, M., & Wortman, C. Reactions to randont 
assignment in an ameliorative social program as a func- 
tion cf awareness of what dthers are receiving and of 
outcome. Evanston, Illinois: Fsy^hology^ Depart- 
mj^nt, Northwesternrf/niversily, 1975-. 

.Hill, A.B., Maj-shaljt^' 8c ShawrXJ.A. A controlled • 
clinical trial of long-tefnj anticoagulant therapy in^ 
cerebrovascular disease. Quarterly Journal' of 
Medicine, 1960,29, 597-609. 

Hold, N. Rational risk taking: Some alternatives to 
traditional correction programs. Proceedings." Sect^ 
ond National Workshop on Corrections- and Parole Ad- 
ministration. STan Antonio, March 1974. Collage 
Park, Maryland: American Correctional Asy/ifc- 
tion, 1974. / . • ^ * 

Hornik, R.C., tngle, H.T.,- Mayo, J.A^RcAnany/ 

E.G., &• Schramm, W., Television and e^^tipnal re- 
form in El Salvador (Research ReporuNo. J4). Stan* 
ford, California:'Ii^titute for Communications He- 
search, Stanford University, Aug4Bi^973. . * * 



103 



Williai 
Semm 



Hovland. C.I., Lumsdaine, A.A., & Sheffield, F.D. 
Experiments^ on mass education. Prittceten, N.J.: 
Princeton University Press, 1949. 

Rnight, D. The Marshall program: Assessment of a 
short term institi^ional treatment program, Part II: 
Amenability to confrontive peer group txeatment. Sac- 
ramento: Califprnia Youth Authority, 1970, 

^cKay, H., McKay, A., & Sinesterra, L. Stimulation 
aj^'^tellectual andmSocial competence of Colombian 

. presehool-age children affected by the multiple deptiva- 
Hons of depressed urban environment^ - {Second Prog- 
ress Report). Cali, Colon>b;a: Universidad del 
Valle, Human 'Ecology^ Research Station, Sep- 
tember 1973. 

Meier, P.^ The biggest public fiealth experiment 
^ver: The 1954 fitld trial of the Salk poliomyelitis 
vaccine. laJ M. Tanur, F» Mo5teller, W.B. Kurs-- 
' kal, R.F. Linic; R.S. Pieters, and G. Rising (Eds.), 
Statistics: A guide to the unknown. San Francisco: 
Holden-Day, 1972. 

Porter, A.C. Introduction to NIE's Program on Meas- 
urement and Methodology. Washington, DJO.: Na- 
tional Institute of Education', March 1, 1975» 

Riecken, H.W,,^Boruch, R.F., Cam^ell, D.T., 
Caplan, N., Glennan, T K., Pratt,^.W/^ees, A., & 
Williams, W.' Social expenmenU^A mdnod for plan- 
and evaluating social progrSms. New York: 
ar Press, 1974. 

Ruffen, J N., Grizzle, J.E., Hightower, N.C., 
McHardy. G., Schull, H., & Krisher, JS: A co- 
operative do<iye-bl1nd evaluation of gastric "freez- 
ing" in the treatment of duodenal ulcer. New Eng- 
land Journal of Medicine, 1969, 281; 16-19. 

Ru^tein, D.D. The ethical design of human exper- 
iments. Daedalus, 1969, 95(2), 523-541. 

Sackett, D L. End results analyses in a randomised 
tnal of nurse pfactiti oners. Research memorandum. 
Hamilton, Ontario: McMaster University Medical 
Center, Burlington Study Group^ 1973. 

Sudman^., & Feber, R. Experiments in obtaining 
cons^^frfner expenditures by diary methods, journal 
of American Statistical Association, 1971, 66, 725- 
735. * 

Thist1^Hw.aite, D.C., & Campbell/^ D.T." 
Regressioir-discontinuity analysis. Journal of Edttca- 
ttonal Psychology,'^ 1 960, 31, 309-3 1 7. 

Wargo, M.J., Campeau, P.L., & Tallmadge, G.K., 
with assistance of Lauritz, B'.M., Morris, 3 J-. & 
Youngquist, L.V. Further examination of exemplary 
bro grams for educating disadvantaged children. Palo 
Alto:^,American Institute of Research, July 1971. 

Welch, W.W., & Walberg, H.J. Pretest effects:in 
curriculum evaluation. American Educational Re- 
search Journal, 1970, 6, 605-^14. 

Yinger, J.M., Ikeda, K., & Laycock, F. Middlestart: 
Supportive intervention for higher education among stu- 



dents of disadvantaged back^ounds (FirTal Report to 
U.S. Office of .Education, Project No. 5-0703, 
Grant OE 6-10-255). Oberiin, Ohio: Oberlin Col- 
lege, Sociology Department, November, 1970. 

Zener, T.B., & Schneulle, L. An evaluation of self- 
directed search: A guide to educational and vocational 
planning (Research Report N.O. 124). Baltimore, 
Md.: Johns Hopkins University, Center for the 
Study of the Organization of Schools, 1972. 



\ 



' IIQ 



Development of Staff for Evaluations 

(A R^rospectrve.View) . 



George L. Kelling 
Evaluator 
Police Foundation 
Washington, D.C. 



George Kelltng is on the staff of the Fhlice Foufidatton and has been ivorktng as an evaluator in Kansas City and Dallas over the 
past several years. In pafticular, Kelling was the Director of Research forjhe major study of police patroC practices carried- out m 
Kansas City In g^nng up for that pro]ect%e had to put together from scratch and manage a Imge and complex research team. This 
paper presents his views on the problems that are likely to be encountered m putting together an evaluation research staff and on 
approaches to solving those p\blems, - . , • . > 



105 



:erlc 



When in confirmation cla^s as an early adoles- ^ 
cent, I, as mahy other young Lutherans, was 
forced to memorize Luther's explanation of the 
three sections of the Aposjle's Creed. While no 
longer able to pull the explanations back into^cojj- 
sciousness, I can clearly> recall the last sentence of 
each explanation. The phrase,- identical in each, 
was "This is most certainly true." The matters 
Luther was dealing with vere, of course, eternal 
verities. While they may or may not be "mosfcer- 
tainly trAie" for others, they were for Luther and , 
he emphasized their importance, to himself and his 
followers with his declaration. 

As a result of^dministering many evaluations, 
. I have been asked to talk to you about developing 
personnel for work in evaluative research. While * 
th^ positions I take in the following pages certainly 
do^not approach, for me at least, the state of etei*- 
nal verities, they do achieve the level ^f pragmatic 
and survival verities in the conduct of evaluations.. ' 
Part of this feeling comes from a set of values and 
assumptions which I have and which perhaps is 
wofthwil^ for me to identify. These includer 

1. It is go(5d to complete evaluations — few 
really are. 

2. n IS good ^o maintain ."experienced lead- 
ership'' in an evaluation staff— read that: "I 
want to survive. ''^ 

3i. U is good to maintain "experienced lead- 
ership'* in the ^organizations in which evalu- 
ations are con||ucted — need I explain ^the 
worth of that to you? 

4. Il^is not that the best predictor of an indi- 
vidual's or organization's performance is 
his/her/its past performance — it* is the only 

^ predictor. 

5. And finally, conflict in the activities of or- 
ganizations and personnel need not be de- 

• letorious to achievement but rather, if thr 
rules of conflict are established, can con- 
tribute tO'Creative^and original work. 



With those* values, assumptions, confessions 
out of the way, I will continue with one final ven- 
ture into the rarifie^air of theology with a para* 
phrase of 4 statemejjrby Paul Tillich.' 

I shall proceed to lecture now, a^d continue to 
perform in evaluations on* the asstimption that 
I am absolutely correct in what I am about to 
say. I am aware that I may be yvrong^but I will 
not let that awa'reness interfere with this dis- 
cussion of my future performance as an ad- 
ministrator of evaluation^; " 
If any of you, as you read or hear this, feel like 
standing, applauding, and cheering, I, of course, 
mvite you to. -If on the other hand you feel like 
booing and hissi^g^ there is nothing I can do to 
stop you, so feel free. 

t^er/fy #f . Where one's tenure is, is where one's 
heart is— or— (/le use of contultantz. 

The use of consultants is standard in evalua^ 
tions and evaluation proposals. Generally consult- 
ants are luminaries from academia who have a 
superb record of research and thinlcing about 
methodology and/or service delivery in a particu- 
lar endeavor. /They are generally competent, lead- 
^s in the field, and involved in a myriad of enter- 
prises. Generally they are capable of, and have 
executed, good research and/or evaluations. They 
are experts.^Basically, theyxan serve two functions 
in an evaluation: 

I. They can help "young comers" get grants, 
contracts and exfK)sure. If done responsi- 
bly, this is legitimate and ought not to be 
sneered at. The function of a mentor or 
sponsor is an important one in academia. 
]|^'Young comers" present*^a high risk to both 
* program 4nd evaluation administrators and 
grantors, but at the same time they have the 
energy, and are enough "on the make" to 
. complete an evaluation. The "baptism" of 
"young comers" by luminaries must be un- 

1 ^ 



106 



ERIC 



derstood for what it is hOwevei^Do not ex- 
pect the *'heavies" to conduct.the evaluation 
or write the results. They cannot and will 
J not. * 

2/ Jhey can prwide te(;]hnijcal consultation On/ 
critical ^oiiits of an evaluation. Three criti-* 
caj {Joints stand out: 
* I a.;*NDw that I have all this data, why did I 
collect it in the first place and what 
should I do with it.*' In other words, it is 
possible that t,he evalUator will get so im- , 
mersed in details that he/she- will forget 
* ♦ what the original gOals of the project were 
and how the data 'deals with tho^e goals. 
Further^ after being removed from the 
world of academia d,\Xnng the year or two 
of the evaluation, the evaluator.may need 
some assistance in ^updating his^er statis- 
tical skills. The consultant or consultant^-* 
can help the field iStaff of ah evaluation tb 
review their workand Update skills. 

b. Review the outline for the presentation of 
the findings. This-i^elated to "a" and is 
part of "a'* yet is so importaitt that I sepa- 
rate it out. Getifkig a^'^good outlme of the 

{ final report is the critical issue in getting 
the evaluator to put his/her pen to paper. 

* It nicely makes a conlpletely unmanage- 
ably task (completing the report) into a , 
manageable one^ 

c. Finally — reading the preliminary drafts of 
the evaluation and providing construe- * 
tive, non-threatening advice. Generally 
upon •compJetipg the first draft, tlie 
evaluator thinks (hopes) that he/she is 
finished' writing. In fact, he/she has just^ ^ 
begun. Remember, anyfihi draft, regard- 
less of its* v?'eaknesses, ,is good. If an 
evaluator Ts reasonably gopd and has 
good consultat^ojv-a'i^ first draft almost 
assures coippletion. 

So much for the p^sftive contributions of con- 
sultants. They can "make real and substantial con- 
tributions, but for all parties involved in the con-- 
duct of an evaluation, it is certainly best. to under- ' 
estimate their contributions rather than to overes- ' 
timate them. 

They cannot: ' | • 

1. Supervise staff.' Young, energetic staff need 
constant and >Ongoin^ stroking, direction, 
love and supervision. Cor\sultants cannot 

^ provide that. %hey do not iiave the. time, 
nor do they control the means and I'ewat'ds 
necessary to manage staff. * 

2. Develop evaluattbn instruments, (qu^stior- 
naires, ttc). Instruments must -be developed 

' by resident evaluation starff .In close collab-*^ 
• oration wup agency program staff. Consult- 
ants don't have the time, energy, and, gen- 
erally, the^patience to collaborate as closely 
as nec^ssary^ 

112 



.^3. Write-up results. The writing of the final 
report is a consuming full-time task.Xon- 
. ^, sultants ai'e involved in too many things to"^ 

^ • be expected to.write-up a final report. 

. , The key thing to remember in dealing with 
consultants (and I do not mean this critically) is 
that they are ;un responsible. They^ are bright, 
knowledgeable, clever, /jut they have no responsi- 
bility for the final product and rarely, if ever, will 
be cornered into accepting- such responsibility. • 
They have ^different responsibilities and will ^eet 
those fifst--— and — that is to be expected. Neither 

, the program evaly^tor or administration should be 

surprised by Ihis as likely they, too, are consultants 

some place. This is most certainly true. 

> "* 

v.' - 

Vfrlty #2. The children ihall lead you (or at leaat 
they wiM do most of the necessary' 
"grunt work"). Sf8ff5frucfufe 

•I will divide this section into 1 wo parts: first, 
^the characteristics of evaluation staff, and, second, 
the characteristics of the evaluation directof . 

Perhaps it will be easiest if I begin wtith the 
characteristics o*f the staff "who are "on sitp," and 
who do the daily work of evaluatiojf^ (Be cl^ar that 
I did not always know these verities, and n)ot even 
when I knew them did I always follow them. One 
result is that ip the early projects I have adminis- 
tered, the casualty rate of project staff was very 
high. In the early days, I often took those persons 
for staff who were available at the time. Some were 
less than^atisfactory. Applicants were few. I had 
no track record as an evaluator. Evaluation was 
considered inappropriate — read '^inappropriate" 
as ^'sinful" — by major professors for thejir good 
students. But, I am getting z\head of myscjlf.) The 
people who actually "do the diily worW" pf an 
evaluation have to have certain characteristics. 
These include: .high levels of eMrgy, methodolog- 
ical sophisticatfen, skill at handling data, keen in- 
lelligence and curiosity, being professionally "on 
the make/' the capiability of using Q*eatjvely the 
great freedom that ev^aluatbrs have, andi the ego 
strength to move wfth some comfort int^ an alien 
•environment. The.£tafT^e€d not have, and, if you 
recruit the prope^^BSrspns, probably will not have, 
organizational "smam," familiarity with the field 
of service delivery^ or expcrie;ice in completing a 
project. (I will discuss theie points somewhat 
later.) 

^Where are these kinds of pec^ple found? (Th^ 
people who do the daily work.) T|ie answer is quite 
clear. In the doctorat programs of universities. 
And as important, in the dn/toral programs of 
gpod universities. Their chtir^eristics are as fol- 
lows: ' 

— They have fteen born, bred, and expect to 
die in universities. 

-^'^ey hav^ -iftver held a job (except maybe 
Vista or a summer (&nip). 



— They have managed to mak^ avoiding dead- 
^ lines a fine art and skill. , ^' 

— They „are ^arrogant.. (Often'they are right — 
the^are more methodolo'gically skillful than 
their professors and, later, than you as proj- 
ect director.) 
— They knoW-how ^o deveFop sophisticated 
. questionnaires but they do not know how to 
talk* to i>eople (read **talk'' as interview with-' 
• out a pretested questionnaire). They will 
. have to be driven, almost with whips, to 
' work closely with agency program staff and 
to really talk to them (but once they do, 
another probkm — that of cooptation — rears 
•its Jiead, which we shall discuss in detail 
lat^r). 

— They view all researchers and grantsmen 
who operate outside of universities as 
wh6res and **operators'* interested only in 
the *'bucks*/ (they jeally believe that their 
^ pfdf^&ors live on their salaries alone) and 
' that all truth is to be discovered in the world 
, by Gondu^ng methodologically **pure" Ex- 
periments on freshmeiift 
And finally, theyVe marvelous. They believe 
"2th^ world can and will change, they work night 
and d^y; they'll darpned smart, and they have that 
marvelous characteristic of youth — energy. (Oh, I 
know-its unbounded and undisciplined, but evalu- 
^atioi*l|Birectors have to do something after all.) 

But now in a somewhat more venous vein, I 
wish to talk about each of the characteristics thaul 
find necessary in staff. 



High Levels of Energy 

^Evaluations are difficult>and time consumirvg. 
They combine all the intellectual and methodolog- 
ical rigors of laboratory experiments with the mes- 
siness and complications of the real world. The 
real world presents a myriad of problems for 
which a great deal of energy is necessary to solve. 
The following are exanu>lcs. 

Agency records not devised for research. 
Often when computerized, they tontain errors and 
omissions which, while not a problem for Agency 
administrative purposes, are in such a condition 
that it is necessary to return to the original docu- 
ments when they are used for research or evalua- 
tion.* (I don't mean \o offend agency offiqials at 
this point, and maybe it is different in the medical 
field, but for the most part all agency data have to 
bie verified for researc^;^ purposes and every evalu- 
ation which is based upon agency data which have 
not been verified in great detail is a terribly sus- 
pect e valuation. )S» 



* Not only it a high level of energy lil^eMary but also dealing with ^ 
. these sorts of probleifls ret^uires a gift for great attention to detail and a 
toleration for the tedious — charactenstics sometimes different from 
in conflict with high energy levels. ' ^ 



As Mr. Lewis pK)ints oat in his paper, ^often- 
times 'agency program managers who are respon- 
sible fgr the administration of an experiment care 
less att)ut maintaining the controls of the^^jexperi- 
ment than they do about ^'starts'* or exporting the 
program to other areas or jurisdictions. I would 
underline Mr. Lewis' point about '^starts" and rec- ' 
ommefid that e^h of you re-read it. The dynamics • 
and consequences of it are substantial. Given the 
medians interest in *'starts" and the fact that* 
everyone\g^\s bored with continuing programs,^ the 
evaluate must attempt to carefully deal with and 
exploit both the initial publicity fri^ **starts" yid 
the ^Uj^sejuent obscyrity when the^ytperiment or 
program is ongoing. The management of rhe 
momentum of an expei'iment is critical and a bal- 
ance has to be developed between the extremes of 
the publicity and momentum of the/^st^irt" and 
later obscurity of slowdown. Obscurity both has its 
benefits and problems. Generally, the momentam 
gained* from the initial thrust will not^prgyde 
enough energy jd complete the task. Occasionatty 
V*boo^ters" frow agency program and evaluation 
staff areabsc^tely netessary to obtain the'goal orf 
a compieted program. Alertness of the mainte- 
nance of the ongoing program is essential' for 
evaluation staff. 

A variation of the problem is **re^arts." That 
is when an agency administrator decides that the 
indicator of his/her wisdom ana skill is Ijis/her abil- 
ity to replicate the program in other departments, 
divisi etc., before the evaluation is completed. 
This tiot only consumes a great deal of staff 
energy (both of agency staff who are pushirtg to do 
it and of evaluation Staff who are trying to stop it) 
but also potentially destroy^ the experiroi^nttQr 
evaluation. by cbntaminaling control areas. » 

Personnel involved in program efforts may 
have more of a vested interest in the success pr 
failure of a program-than in the conduct of the 
experiment and as a result iiiadvertently^(Qr jpur- 
posely) attempt to bias th>? outcc^me. Evaltutorjs 
must constantly monitor^in as. discreet a manner ' 
as possible (as monitorirrg itself rilay develop resis- ^ 
tarices) all planned stimuli, controjs, and data col- 
lection. * . . 

^ Dealing wi^ih thtse and a myriad of the prob- 
lems simply requires a high leyel'of alertness and 
effort for a prolonged period of. time. There is 
much "diFty |tf>rk" which has to be done add on- 
site personn^^^e to have the endurance to do it. 
(In one city tB^rdirty wprk" meant night work fof 
at least six wcek$ in a record division. Tha^ was in 
addition to the regular day activities.) 



107 



ERIC 



113 



Methodological Sophistication 

Often the exigencies pf^eal world agency ex- 
istence are such that progrs&n evaluation can be 
quite compHcated. Finding the right design — that 
is an evaluation design -which is as powerful as the 



A. 



program allows and warrants — requires consider- 
able methodological sophistication. The "match- • 
ling" of program*knd evaluation design is not to be 
accomplished by returning on^ more time to. the 
bible of, Campbell and Stanley but rather c'omes 
throligh the careful '*wedding'* of research tech- 

, nii^ues and pperating programs! There is nothirW 
mysterious about this. The e.valuator simply must - 
>*'muck around" in the program, data, .and. funds 
and find a Resign which inappropriate to the pro* 
gram operation, the" tunds -available, the impor- 
tance of the program, and the av^lable dai^. That 
means that the staff must know design and scien- 

^ tific method and not just have a shopping basket 
of^desiens, one of which she/he pulls out for this ' 

• program. 

Skill at Handling Dajta 

^ Twp important things'* have to be said dbout 
^"this. ■ ^ . f , • 

One -staff member ^as to approl^ch the 
.psychological state of being an obsessive compul- 
sive. If §omeohe does not keep careful recorti of 
every decisiprt made regarding design and data^^ 
'Storage, the disaster pf having' to reconstruct ^thosc 
decisions will result in the waste oi spending the 
time re-doing things and also of not meeting dead- 
lines. Not that things cannot be reconstructed, and 
generally they can, biut to have no way of identify- 
ing which cjuestions were refated to w^at indi- . 
cators means a {>eXiod of reconstruction b^6nd 
that normall^required to re-familiarize oneself 
with the material. Two examples. In the Kansas 
CKy Pceventive Patrol Ex[>eriment, tjie details and 
records of the sampling procedures for the com%* 
munity survey werj^ nevej gathered together in 
one file or written up when the sample was drawn. 
When, 18 months later, we had to discuss the sam- 
pling procedures, at least three people in three 
different organizations had to search they files for 
^the various memos, instructions, etc. It was possi- 
ble, but that which was easy to do at one time, be- 
came complicated' at another. On the other hand, 
inDattas we dkl two departmentwide surveys. The 
Ti survey was completed in 1973, the T2 survey in 
1976. Because we had carefully documented the 
Vsoufce of pvery question, all coding decisions, and 
every other decision, the time necessary for review 
waS y>e^it telating the- theories under which we • 
operated to the forms of analyses we were ta use. 
Thus, an axiom emerges. Never, never,' never rely 
on memory. Rely on it only tq fail, ar^l* even 
worse, to deceive. ^ 
* The second area of the importance of data 
handling Tias td da with the assessmpjit^^ agency 
records. This is no simple matter, especially in / 
police a'genfies, but I suspect in other agencies as/ 
Veil. Again I wanl^ to "emphasize that I imply fi'o 
criticism of agency recbrds.'I srmply have no way 
of^knowing whether they are adequ^^tc for admin- 
istrative pjurposes. I assume they are. You are i|fi*a 



better ppsition to know that tha|) I. I di6 know, 
hqWever; that almost all.recor^s^will need cpnstd-.* 

. erable work to be suitable f6r research purposes. 
If the records are computerized, considerable 
work'will have to be done to insure its accurac^ 
and reliability. (Even at that evaluator§ must ap^ 
proach them cautiously since much of it is s6lf- 
reported information, i.e., crime and activity^ 
analyses, whfich are subject to mafnipulation, 
whether conscious or unconscious, to show desired 
or^self-serving results*) If records ar^e kept in 
manual files, other problems, such as coding, or 
agen^ policies which allow for sevei:kl file systems, 
emerge* (In one police department complaints 
against polic* officers are kept in three different 
places — depending on where the citizen first filed 

• his complaint-*-and may or may not be stored with 
^ the other units. Notice the phr>^se ''may or may 
not" since that complicates things considerably. If, 
any officer has complaints filed against hini^in 
mofe than one locatiQ^ ,'and many do, the 
evaluator has to carefully read each one to deter- 
mine if they are separate or the same cpmplaint. 
Thus, even establishing the "n!A of eoniplaints js 
not accounting task but ati analytical task.) 

The evaluation staff has to know whz^ they 
know, both in teritis of recalling decisions afnd as- 
sessing data. Both ta§ks are^far more complicated' 
than generally thought. 

Keen^ntelligence and Curiosity 

In some respects this is self-eXplanaloryx- But 
while keen intellige^nce and curiosity are necessary, 
the^ are not sufficient. They have to be combined 
with many of the other characteristics described in 
this section. yWathout energy, discipline, and 
creativity, intelligence simply is not enough. 

Let me add one thing about curiosity as I 
think it to be quite important. The characteristic of 
asking "why*' is absolutely essential. In the first 
place it helps to keep the intelligent pers6n from ' 
seeing the emperor's clothes* The ''emperor" can 
be the agency, the evaluation directcfr, or col- 
leagues. Secondly, it helps the evaluator pursue 
unanticipated findii^. And," if properly pursued, 
these unariticipated hidings can be quite, impor- 
tant to an Valuation. It might mean the evaluation 
is on to something new (I call your attention to Mt*.. 
Bieck's study* of police-response time. The surprise 
fmding of the length of time it takes citizens to re; 
port even serious crimes is' not only of great re- 
Search and program interest, but is also an indi- 
cator of just hpw poorly thought through the' 
whole businessof the importance of police re- 
sponse time has been by police* researciiers, and 
evaluatofs.) or reflects an artifact of improperly , 
. stored or analyzed data. The eValuator, whQ,- out 
' ^f his/her curiosity, continues to pur^e those 
leads, either e/icicheS the evaluation immensely or 
saves i> from spuriouS findings. 

\y • ■ ■ 



^ saves 



' Professionally ''On the Make'* 

Perhaps ii;is purely a personal matter on my 
part,, but I simply have an easuer time deahng with 
•people who knaw what th^y w^nt. I find it difficult 
to deal with people, on a project' le\^el at least, who 
are indeciisive aboirt their own goals. (By that I do 
nql m'ean that evervDne who comes onto an evalua- 
tion staff has tq Icnow thai he war^to do evalua- 
tion research in a particular servic^ftlivery system 
for the fest of his life.) She/he may want l6 ga'in 
research experience, get publications, eftamihe a 
service system/Of do' a vartety of oihej-.t'hings, bm 
tH^y 'have sojne* Sense of tiieir, own' g6ah. ff/hat 
"purposeTulness" fs frot presented in staff mem- t 
bers, I have been unable lo develop il (And* I 
don't mean that an individual's goals tan't ^hange, 
but purposefulness'rtmains.) The casualty rate of , 
tfiose Jrt'hb have -no^'been punpo§ef^i|-i ha's b^enf v^erv 
high. ^ * . \ * . ' . V 

Those people who ar^ b^^inning fheu* earqpi"s 
and are pui^seful d^rTv do 'iiW )ex kr^pw fhe^ 
pi*ice5'()f long Ijiours and crash .{)Po3ticli()n ^(ihed- 
uie*? which 4*he\ w^l have tp pali uo' ©Biain Vhat 
the\ want. Bui, rtiev*will le^rnN-kat^ 4fid most * 
pTeopJ^r "OH jhe make*' are.' willing tcTp^iv.tho^ 
ptices^ ReoplelU ho^re not aggress; vel|^ur pose ti/l 

^^simph aren't ip^i>ated eg'ou'gh. to **pa\ the^ P/^ce*.'" 
(That^ mai.e$ kertse h^^^^i don'4 kndw ivhat vou ^• 
want, whv shbulpi vou:"pay the priCQ^'?')' ' ' * 
*' A side conVment here "Peopl'^ w ha do £valua- . * 
tK)ns live on g/arf5*fTrs''to ?think /)f'-'evaluatioj3f 
bureaucra( les ihat do ilil^ liVe wn^grants qpme ^* 
to'mind Whi^e! m^i^diCal, social pohce. and' o^er 

' ser\ loe svSiem^havc^ on^Mns'^xistences^inde^p^nd- 
en^ (jf'tnost *^peciffc praJe(Sts,rev^l.yatif>n'p€opl^ ^ 

^either iive from grant to gr^^^t, 5>r Work full-tim^. 
in a university or ^c()n^uking\H"»rnri\^nd do-evalu^- \ 
tions part-tinrie,The reSuil is Ui,2^^fb?*aw e.va1uaition , 
capacitv to surv/ive ♦nOt^Vilvrmd^^ft* the' cv^alu^^- 
trons at h-and hui it; must; aho- t/^e Fcspurces 
(primarilv tinpe) iCy generate ;rt^»^5*^b^*aJs.j Tiie 

*alterr^saive is<oRStanf "geariS^^*^** -fi^^tJ^sijfiarifl- 



.that I have not always been completely candid 
abcHiti.this to agencies^or the PoKcp Foundation. 
We have" called these-^nterests the- **oh by the 
ways." To |n^ur#tlte pro'tection of agencies, I have 
I "always assured thein, "and meant it, that nothing will 
**l5e published without their reView and perrrtission. 
The resultanx problems are different t^ aj^ /one 
waiild, expect. \Fiifst, t)ie agencies encourage 
publications — administrators have found that 
agency ^reputation is enhanced by such ac<ivities. 
Second, and this gets to, be a problem, oftentimes 
t agehcy adminisrttator^ get to.be ftore' interested m 
' the *'oh by the ways" than jn' the evaluations. (The 
\, iConsequence of this i^ that .sfaff time *ciin be di- 
verted* away from evaluation-specific activities to 
less critical issues at !he wrong time.) 

•But the point is that the. data, if properly col- 
lected, can l^e availably for publication independ- 
ent of whlet,her the program succeeds, Jails, never 
geu off 'the ground, .(5r collapses In the middle 
' (that does lijippen, ^unfortunately much, much too 
y^^often) a«d. young staff, can- ge.t the publications. 
^ nec'essarv fpr tlieir own careers. And, wolild add, 
dat«t is just'tqo expensive to collect to be used for* 
onl,y one. purpose If, at no or relativek Uttle ex- 
° pens£,' d^^|fccan be collected which is muUi- 
purp<3se^ ^*ems to'me onfv prudent to do so. • 

^ 'Capab4e^of "^^^^^ Freedom, 

' . -For • sormfi youtj^. researchers, the freedom' ^ 
- ,provrd6d m^valuation is such ^a burden thaJ: thev^ 
•just ^ca^^^tfnandleMt. They search ft)r day to day di-jr 
1^ r^ctioti.^^are terrified of makin^g mistakes, w ithdravv 
irjto.obse^sipn about codes or atialyses, carK start 
' to w rite a^jjp^ort becaqse air they can think about is 
th^ final product rather than just the p4ge the) aVe 
. on; ge;! -preoccupied v^ith the administrative issues 

of evaluatiah rather than evaluation itself, etc., ^ 
\ At H^orst tber begin 'to ''ri^p-off freedom, using ' 
theiV ti]ne fof/^cNvmes other than the^%evaluation 
worl; (S'pt that« ijf^el that*staff should not be in- 
* vol/ed 'in ot yer/consulting, lecturing, etc., '^c- 
^'tivi.treS. I. |hii^k/they should. It gives thenfi wider 



ng** a staff, '^either one of/^M^khi^est^o^s 'pfab-' ' . . ^^pcJsure'at Somebody ^Ije's expense^ they' en- 



ERIC 



• Iished oi^ani/ational sKiH^^.Jfw WQj;ki»g ri5:{aH( 
ships Thus^ in ^m^ jucJg^^^^L*^^'^!^^^'^'"^' n?*"^ -be 
preparecl4o "/paV th*^ pn'ce^^ff amstant jiresslirq to 
both complete and generaf^^ activjyes sinriult^n^r 
ously f Wor-kaholtc's** maLke^good--evaludtTJ^.) 

^ \'One fisral^ c'om^e^tj^out being '^G^^l?^ . 
makg.'*^ I believe, t'ha< .ijrtosi goad evaliwofs are 
from universities? and vf ill a/id* oughx t(^ ret^tti to 
universities* for rest and>r^c/eation (in \\\^ finost 
sense o.f fe(;:reatron, th^{ is re-creation of knowU 
edge and skiHsJ. In prdef to do that, puj>hshing*is 
an absolute necessity Thus, from the beginning, I ^. 
hjive jried to insure that the data cbllected.Vill be 
not only' necessary for>evaluation, .but also, wherir 
ever possible, be useful as sociology, political sci- 
ence, or psvcholQ^ and thurresulffn publications* 
independent of the evalCiatioj?/ I must also confess 

' ■ ■' He 



haQce the reputatidi\ of the entire xap^city , and it^ 
k*eeps ^hem from, being* too narrowly focused t)n 
* particular^ pFojects. But thev ^mu^ do so ^^^^heir 
own expense, .not at the ex'pfflse of the' ^valua- 
tion.) Finally, they may became so cynical that 
termination* is'inevitable. TJiey are. not necessarily 
''bad-*' people, il's'just that the available freedom 
simply leaves them unable. to function. ^ 

'For others, freedom isfen oj^portutoity to re- 
spo^td flexibly to the myriad of compflexities that 
occur during the process of an evaluation. They 
(^and I think, I am covering somewhat simtlar 
ground as I did wh^n I talked of purposefulness) 
feel corjifortaWe making decisions and ftiak^ng 
mistakes. They are far tnore comfortable cAm- 
muntcating ta the project 'director w hari they hav^ 
donc^vvhy they have done it, ancT — at first I found 



.109 



110 



ERIC 



this si<rprising — what misiakes ihey have rrtade, II* • 
iQi^ns ouflftal, while ^hey obsess less, they are far 
rtore thorough in recording their decisions. And' 
finally, when in a jam, theyMogk for . Help. Those 
who can really hantjUe freedom are open and 
cpmjinunicative. Those who cannot, turn secretive. * 
An<J once the vicious cycle of^secreti^veness, begins, 
I'have' not yet founci a \yay to intj^-fere witb it. 

« One final comment; th^re'^r^ good pepple , 
who', at times, seem to g6"iato 9 work moratorium. 
Generally, those periods o^cur during the quiet 
periods of an evaluation. It seems thaf they go 

. through period^ wheij they can't get anything 
done, and just can't g^t started-. Jhey, different 
from those WTiQ can't handle freedom, will oftfen ^_ 
feel quite guilty, some'even going so 'far as -to 
suggest a reduction in paid time during this 
period. They are in need of support and assuVy 
ances that the moratorium Will pass and that when^ • 
"the work crunch" Comeg. they will mor^ than 
make up for lost time. ^ / 

Move Into an Alien Ehvironment 

I will begm this section by paraphrasing Wil- 
liam Good^ who, in one of his books on occupa- 
tions and pi-ofessions, says something like the fol- 
Idwiji^^ # . 

Men at work and forests appear peaceful but 
u|X)n close examination one finds that in' both 
[work and forests], struggle is both swift *and 
deadl^.^ ^ 
It would be nic^ to believe that evaluators and 
agency personnel could work'togeiher happily and » 
productively with little or Sio conflict, but that 
z^^ems rarely to be the case." And it isn't that lined 
up on one side are the "good guys" and on the- * 
other "the bad guys," or that one set of activUiei 
are reasonable and another unreasonable, or that 
which one group i^ doing is more Tm'portant than • 
that which th^ other is doing. In fact, "good guys", 
are on beoth sides, both sets of /activities are reason- 
able, and both important. The problem is that • 
agency personnel. Whether knowing it or not, turn 
power over .to evaluators when they contract for an 
evaluation. While il is unforturlate that this is 
rarely made^ e^cplicit when the contract is made, ^ 
apd even more unfortunate that it is only barely 
understood when it is made ^j^plicu, this transfer 
of power is a, powerful determinant of evaluation- 
service agency' relationshi|^. Let me give an exam- 
ple. If an agency decides to*do ari .experiment, the 
administration will impose restraints on the discre- 
tion of administratprs- to tranljfer personnel, start 
new programs, reallocate equipment, adjusi 
schedules, etc., etc., etc. It is immediateJy^apparent 
what this does to the founal power structure of an 
organization. Just contemplate for a moment on . 
what it does to the ivformal power structure. And, 
'the evaluator becomes, at times, the "tatder" and , 
depfeiWirtg uf>on circumstances, at other times, the 
"enfoBcer." (It sh^ruld notice surprising that in the 



eleventh month of ^ year's experiment eveh the 
cljief, or top administratt)r, win want. to give in to 
his subordinates. .Often then^only the threat of loss 
pf external funds can assure completjom) 

, ' This conflict is^compoUnded by the fact that 
often eralilajtors hav^ 'different norm^, goals, and 
lifestyles'* than agency j>et:sonnel (this is especially, 
the case fo'npvaliiators who deal with.poKce) and it ' 
is possible for jnutual "culture shock" to develop. 
The evaluator is often not used to the 9:00 to 5:()b^ 
day of many agencies. As a' student he/she fouflo^ 
that the ctwnputer was less expensive and morej^c- • 
cessible after 11:00 p.m. His/hef work pafterns 
were made more tuned to his/her own personal 

' xhythmx than those of an organization/ Bureaycra-^ 
dc niceties. seen? irrelevant. Adjus^l^g to political 
realities ,seenls dishonest. And so it goes. Both 
evaluation gfoup and service agency find the work 
and lifestyles of the other alien. And little can be 

- done to€hange th^t. Both staffs can learji to re- * 
spect and .tolerate each other, but only if they un- 
derstand that conflict is not to be avoided, b«t 
rather managed. * 

So far I hav« talked exclusively about the 
necessary characteristics of field staff members of * 
an evajuation. I would like to talk briefljj^abput key 
characteristics of project directors. (Just as in the 
previous discu^^ion, I shall be 'talking 'about the 
jde^. I am certain that just as perfect field staff do ^ 
not exist in nature so neither do pe^fect p\pject 
directors. The extent to which I, as a1fi evaluator,.. 
approach the follo>ving characteristics is unclear. I . 
will not burden you with my o\\n assessment of 
how I rate in striving for th^Mdeal,) 

Although I think other characteristic^ are im- 
portant, I will identify three 'key ones: organiza- 
tional "smarts," farrtijiarity with the service de^iv^ 

' cry system, and exfierience'in completing a proj- 
ect, I will keep comments about these to an abso- : 
tute minimum. 

Or^ajzatioi^al "Si^arts" 

*^ ' To ftdjTHnistratfon ^^d intra- 

organitational wQrk is, to a large extent, the effec* • 
t\ve use of power to get particular 4asks done ex- 
cellently and then distribute fairly the benefits 
which accrue fjcom getting the job done. Lined up 
against the struggle to get work, done excellently ,^ 
are the work patterns, pft>cedures, and organiza- 
tional rules of grantors, sponspring agencies, re- 
view groups, evaluation agencies, etc. Think of 
many of those for a moment. 

Planning period^s are^pot allowed. Generally a 
program is funded and started and then the 
evaluators are called in. False starts are not al-. 
lowed. If, as in Kansas City,* a false *start occu^, 

« most often, the response is to "make do" rather 
than start over.- (Read "make do" as ^*wastc all the 

'money, hot just part of it,") * 

Failures are not allowed to be published. 
Rather than publish* a failure so thit other people 



can learn, the tpndeqcy is to squelch a failure (so 
that other people can also fail).* 

Degsionsare notallowed, Often the adminis- 
trator asks the question, **What does the rule bo'ok, 
organizational manual,^ etc., s^y?'''The obvious 
concl^usion is if the rule book says it c^n't be done 
then it can*t be done. (What marvelous freedom 
for the administratoH All the prerequisites and 
s none of the detision^^aking.) . > 

(Let me 'apologize to those of you who con- 
sider me 'outrageously irreverent in my attitude 
^ towards organizational rules and procedures: I 
have become^ convinced that the purpose of most 
Kules is twofold: 

J. They are to protect agafnst ''rip-off's" — 
although I susp^t that more then than not. 
they serve i6 stop the very minor jexpense 
account ' rip-offs" rather than the really 
gross ones. 

/ 2 Thev protect' administrators from having to 
make decisions 

. But let me add, it would be an over-simplification 
to sav that procedures and work patterns ought to 
be removed* Thev ought not to be. They serve an 
important function. When properly administered 
-thev can protect agencies, grantors, ett., from 
gross np-offs and absolute incoYnpetence Unfor- 
• tunatelv, the rules, etc!,*do little to encourage ex- 
cellence find can interfere with such achievement. 
The kev is that an effective administrator has to 
learn how to»wend his wav through such^ules,* 
using them, if possible, to his advantage in getting 
the task,s done. There are various strategies to do 
this. I have known and seen "creatiVe bureaucrats" 
.who work 9 00 to 5:00 hojjrs, take breaks and 
luriclv~at precise times, and who, becaMse thev 
know the rules and play the rules better than any- 
one else in the organization, usd those rules to get 
— jobs do'ne Thev 3re beautiful to watch because 
> they have really^iastered the skills of bureaucracy 
and remember that, id^?ally, the flinctiqn of rules is 
-to get a job done [I have also seen accountants 
who understand tljat money is to spend (o get a job 
done. >iot spending money is no merit It can be 
irresponsible ^not to spend. nioney.] There are 
strategies other than being a "creativelWreaucr^t,'* 
but the ^killful administrator learns how to use 
- rules to his/her benefit. These skills are developed, 
honed, tested, in the world. They are not taught in 
, universities and rarely talked about in bureauc- 
t racies. Learning them is accompanied by the ac- 
* quisition of bruises, welts, sc^rs. burns an^ age. 
Age alone doesn't do it, but it is only through the 
attainment of experiences to be reflected upon 
that these skills c^n be acquired. There are men* 
tors and tutors to be had, but they rarely formally 

• rhu It rf allv a vrrv r ompl#*it t*^ur aod ont that can onh hr rr\rrred to hf r c 
Thf pub^aiK^n offaiiurrt itdangrrow toag^ncv 4(iaiini«(ra(ort brcautr it wmpJy 
providrt annihcr w^ap<in Jo thote who arc alwavt lurkmg in ihr wmjp waning lo 
exploit any mitcakrt made by (omp^tent pcnpir who make miuakrt and are willing 
lo a<ftnii ihem As a rrtuit the publication of mitiakrt hat to be carrfuNy orchr»^ 
irated 

ERIC . . ^ 



teach. Most often they put you through it. At early 
stages of yoLir career yoU know only aft^ ybu've 
been, through a particular lesson and you , sit 
bruisted and smarting that ^ou hav«^ been taught. 
Later,* you know as it happens, and while you may 
niot oarticularty enjoy it at that time,"" you can ad- 
inirejlhe skill with which it is accomplished. [But if 
you have concentrated during yoyr early lessons, 
there really aren't all the accom[3|myitig pains*- jus.t 
generally, the reminder that when doing complex 
work It IS necessary always to be very alert.]) 

The coupling of energetic^j^right, relatively 
undisciplined young research'ers with a seasoned 
organizatfonal veteran who can provide a certain 
amount of stnicture (or th^appearance of struc- 
ture) s^ems to me a [ikely ^^rantee of 4 reason- 
able success in completing an evaluation^ 

Familiarity with the Field of 
Service Delivery 

WhiJe 1 am not sure the, following assertion 
will be absolutely cleaV, I nevertheless want to 
begin with it. I am nbt interested in evaluating,par- 
ticular programs I am interested, and I think my 
clients are best served, if I evaluate methods and 
strategies^ not progrdms Let me explain that The 
irhportant principle here .is generalizab>litye A pro- 
gram is onTy of feneral interest w.hen it 
exemplifies methods, skills and strategies which 
are relevant to a w ide variety of settings. Programs 
may or may not be that generalizable. If a program 
IS ^o dependent upon local circumstances that it 
cannot be exported to other settings, as an 
evaluator, am sim^v not interested in it. It may be 
' that It IS of legitimate interest to the age^y pro- 
gram officer. But. Lam interested in developing 
the knowledge base about the effectiveness of 
methods and strategies which are transferable in a 
broad field of service delivery. In order tp se^ the 
broad application of a project, an evaluation direc- 
tor must know that service delivery^ system, must be 
aw'are of the intellectual traditions that have given 
'rise to the present know ledge and skill base of that 
profession. And, it ^eems to me, she/he must be 
able to help the client context her/his program in 
those tr.aditions. If the*evaluator can't do that, out- 
cornes are meaningless, 

I dicf not include this in the characteristics of 
evaluation staff. 'if they would have such knowl- 
edj[e of the field when they started, that clearly 
"wi^uld be desirable. But it is not essential that the 
evaluation director makes certain th^t staff acquire 
U during fheir work. Staff will, if highly motivated 
(one clue to the curi/)Sity, skill and interest of an 
• evaluation group is the extent to which they 
quickly start immersrng themselves in the litera- 
ture tf) acquire familiarity), acquire familiarity with 
service theory in relatively brief periods of time. 
(Methodological sophistication cannot. That has to 
be leaned by doing as wdl as studying.) But since 
the project director is the person w,ho will be set- 



111 



U7 



ling the general directions of t}ie evaluation group 
and providing the overall guidance,' it is essential 
^hat he/she koow the substance and theories of the 
field. 

Experience in'dempleting a Project , 

Evaluations doh't complete themselves. A staff"* 
can be skilled in data cb^ection, analysis, theory 
building and graptsmapship and still not be able to 
completb' an evaluation. The best of people can 
block in completing^ an' evaluation. It's almost a 
stage in research or ^evali^tion. The person who 
jjas been through completing a project^ knows the 
project'can be completed. The fact that at least one 
person knows it can be, completed is Critical. Out- 
Jiqes circulated wfdely to collea'gues and consult- 
ants caa»help disperse the feeling of hopelessness 
which develops when people sit down tp write 
after five years of work and $600,000 of funds. 
And, if they have kept their records, exploited the 
resident obsessive compulsive, and if they can nar- 
rowly concentrate on the questions the program 
addresses rather thah the ''oh'-^by the ways," the 
first rough draft is ha/lf wruten by the time^they sit 
down to write. (In jother words, if tbe project has 
been well run, the writing of the final report began 
with the development of the original grant. Report 
writing implements include*^ scissors, scotch tape, 
xerox machines; as well as pencil and paper.) 

These then are the characteristics that I find 
essential in good evaluators, both staff^and direc- 
tor. No doubt there are other characteristits which 
should be addressed here, but, at least for me, the 
mentioned ones are most critical. This is Aost cer- 
tainly true. ' 

Other miscellaneous Verities: ♦ 
Verity #3. In order to understand one (police 
officer, physician, nury, social 
worker) you must nof be one (the 
other side of— Mn order to understand 
one, you must be 
* one'')— or— coopf a(/on. 
Much police, social and medical' work is per- 
ceived of, and often is, exciting and important. For 
young persons who have hardly seen theoutside of 
a university, such real world work will be attractive 
and interesting. For many it will be a welco^^e- 
lief from the years of thinking ahd reading ^Wer 
than doing. Their high degree of interest in such 
activities makes them especially vulnerable to 
cooptation. . ' » * 

My own experiences have led me to the follow- 
ing points of view regarding cooptation. 

1. It is to be expected. It is a stage that all re- 
searchers must go thfough if they are prop- 
erly sensitive to their subjects. ^ *• 

2. Cooptation is a trade-off. Whether agencies 
^nd evaluators do it. consciously or wiifon- 

sciously, both try to seduce the other to 
their respective points of view% In so doing, 
both allow an unusual amount of access to # 



the "secrets" of thejr organizations.* When 
• remission from cooptation occurs, the re- 
searcher (or professional) Is generally much 
.wiser, about the other organizatiori and 
him/Iftrself'. 

3. Although there are counter-strategTes, i.e., 
slipervision, and* creation of a staff culture, 
most often remission is spontaneous and oc- 
curs when a terribly biased initial report is 
reread with horror and shock several 
mt)nth^ later. (Hejre, good supervision 
points out the universality of the ailment, is 
supportive, and recogpizes it as an impor- 
tant learning opportunity.) 

4. There is no subsequent immunity ^to it. It 
happens over and over, even to crotchily 
old jJloject directors. 

5ylf remission does not occur, more likely 
than not it is terminal and career counseling 
is in order. Unreconstructed co-optees are a 
disaster to evaluatipns. They are devisive, 

/ secretive, and generally have all tlje zeal of 
religious converts. Truth is theirs alone. 

6. Symptoms include: (for police 

/ evaluations — people doing evaluations in 

other agencies will have to fill in their own 
specifics) 

a. W^anting to carry a gun. 

b. Feeling that nobody really understands 
the police as well as you do. 

c. Becoming a police *'buff." ' 

d. Overemphasizing confidentiality. (When 
'cooptation has occurred, the principle of 
confidentiality i|^ludes and more often 
than not is sp>ecificallj 'targeted, at the 
project director. The researcher feels 
that he must ''protect the poor police de- 
p^tment and police officer" from the 
rapacious project director.) 

e. Developing the police ''swaggei^" 

f. Using police jargon. 

g. Wanting tp get involved in the action, i.e.,- 
help with arrests, etc. 

h. Ignoring findings or ''twisting the text to 
meet the message." 

And finaljy, I would argue that the staff 
member who rs never cooptable siipply is too disin- 
terested or too far removed from the issues. Coop- 
tation is like sex and love relationships, .Vou might 
not want it all the time, but withdut it there's bore- 
dom and disinterest. This is ijnost certainly tru^. 

Verttf #4. The Qnly truly unforglv^le sin is 
covering mistakes a second 
time— or— m/sfa/iref af work. 

t Mistakes are common foi* people at work.^My 
own feeling is that I make a minor mistake a day^ a 
middle range mistake every week, and a truly 
™joF goof-up once a month. Such is the nature of 
work. But mistakes are not to be confused with in- 



lis 



competence. People have right>^ mistakes, but 
, not to incompetence. And the nature of the wortd 
of work is such.that> given proper colleagueshfp, 
supervision, and direction, most mistak^can be 
handled and compensated for — most^often by 
: extra work. (That is to be expected.) And white it 
might sound, Pollyannaish, I really ^believe ^that 
mfstakes and the handling of mistakes provide 
some of iha most critical opportunities for learn- 
ing and growih to capable reflective people. 

Furxher, it is to be expected that some [jersons 
who make mistakes will try to cover them up (npt 
by redoing the task but by hiding what they -know 
or lying), a result, a project jlirector has to be 
careful to remain familiar-.<Tiough with what is 
going on to be able to spot the coveVing of a mis- 
take, especially a, major one. When '^covering" does 
4)ccur dramatic action is- necessary. All must be 
made to know that that is the one unfoVgivable sin 
and, if "covering'' ever occurs again, that's it, 
Termination, firing, is the only alternative. 

But, for the most part, mistakes simply have to 
be lived with ^s a fact pf life. Often one can only 
shrug off the minor mistakes knowing that it 
would be more of a mistake to try to undo it than 
just to forget it. The middle range mistakes often 
have to be made up for by extra work (not that 
anyone tells you you have to, it's simply work that 
has to be corrected). Regarding the major mis- 
takes, they not only r^uire effort K) undo (some 
may be so serious that ihe^ cannot be redone) but 
they also provide rich le^rhing experiences in liv- 
ing with the consequences iof Rfe. Be clear, major 
mistakes generally do have 'consequences, but most 
often the conseq.uences are , not calamities if faced 
up to. ' • 

For me, my primary goal regarding my own 
mistakes is to discover them myself and report 
them. (This can be read as honesty or practical 
realism.) Such reporting does not free one from 
the consequences however. It simply is the de- 
velopment of trust in work relationships. I hope 
that my boss can trust me completely: That is-=- 
that he cap trust that I will make my mistakes, but 
that' he will never be iurprised by them. .have 
found few mistakes that cannot be, h^dled in civil 
ways. Covering a mistake, on the other hand, may 
mean that the opportunity to redo it is lost and 
potentially ^s disastrous to a project. (If I sound 
''preachy" at this point, it is becau^ f feel quite 
strongly about this. Much of the work we do in 
evaluation is new and exploratory. If staff runs 
scared because they are fearful of making mistakes 
or takmg approprilte risks, then the whole enter- 
prise is lost. Evaluations are simpJy risky business. 
BrigJ^t competent people have the right to mis- 
takes. Evaluations and evaluators can fail. If fail- 
ures are seen as legitimate, then we can continue 
to develop our field, both through the successes 
and failures of ourselves and our colleagues. But 
O 



failures, too^ should be published so we don't have 
to go oh and on making the same major mistakes* 
in evaluatfons.) This is most certainly true. 

Verity identifying the laborer Who is to be 
^ ' in tlie vineyard"— or— se/ec(/ng a 
1 subcontractor.^ 

Although I do not have a great deal of empiri- 
cal evidence -about this, I nevertheless am con- 

* vinced that every ^valuative organization has a 
genius of design working someplace, in the inner 
sanctums of the organization. That person is not 
on^y a genius'biit ofteti too has E.S.P.; in that 
she/he seems to be uncannily aware of exactly the 
design the contractor has in mind. But the grantor 

' will never rfieet^this design genius and once she/he 
has ('ompleted'the design, she/he will be irrelevant 
to the evaluation. The poir^ I am making is that 
the ke) persons to assess in'selecting evaluators are 
the people who will actually do the work. They willy 
make or break, the evaluation. Even the projea di- 
rector is not enough. You must see and make" 
ju^ments about the key ,on-site evaluation ^aff 
mpnber(s). This is most certainly true 

Verity #6. The truth shall make thdm * - 

free— or— pass/ng by the crotchity old 
evaluation director. 

And finally, if young researchers are bright 
and capabfe, a«Kl if an evaluation director has 
given them the opportunity^ to really use their 
magnificent selves and skills, and if he/she believes 
that knowledge and skills are really crescive, the 

# evaluation director will see young evaluators fly 
slightly higher and /slightly faster than the crotch- 
ity old evaluation director. And that's wtiai it's all 
aboirt and is most certainly true. 

Conclusion 

Those of you familiar with herrpeneutical 
principle* will recognize that I have used the clas- 
sic three point Lutheran sermon style: Introduc- 
tion, three points^ the body with th^TtiNitral part 
being both the longest and most important, and 
the third part a miscellaneous section where tjjings 
are put that 'don't fit into the outline. The conclu- 
sion is generally an exhortation. I have presented 
my verities. I shall spare you further exhortation. 
yind that is most certainly true. 



One final point. My evaluation colleagues, the 
Kansas City Police Department and I have 'com- 
pleted an experiment which has been considered 
^to be fairly well done. We were very, very lucky. 
We worked very, very hard. Most of the things I 
am'telling you are in hindsight. I may bfe wrong. I 
think I am right. That is most. certainly true. Selah. 
Amen. 



Additional commap on putting together a good 
evaluatiorv researcfTteaip. 

Lee Sechrest ^ 

The fkills involved in carrying out good pro- 
gram evaluations are special and not widely avail- 
able. There are suffjfcient special characteristics of 
progr^ evaluation' Research to make it ^likely 
that researchers without specific expepnfents 
anld/or trailing for evaluation wilpbc'^le to re- 
solve all the problems that ajr^T sute to arise. 
Therefore, an^administrator Ranting to iricome 
involved in program evaluation researcl^will not 
maximize chances of succ^ful completion of the 
evaluation by reiyipg on the usual sources of jre- 
search expertise in his community, e.g., a local 
universky faculty. Unfortunately^ many university 
faculty members have no notion that their 
capabilitiesTnay be in any way limited. 

In facr, most aclministrators will need some 
help in locating and recruiting evaluation re- 
searchers. There are several sources for such help. 
First, the potential ^funding agency for* the re-, 
seaixh will often* k|i6w a good bit ^bout the locdl 
researctr community and will be able to make rec^ 
ommendations based on their experience of re- 
searchers 'who have the needed expertise and 
interest. A second source of information often 
available is the directors of other similar evalua- 
tion research projects. If arNadministratqr knows 
of evaluations whkh he or she considers to have 
been well-done, a good move would be to contact 
the evaluators of ihose projects for advice. Even 
though the evaluators are a,t a considerable dis- 
tance, evaluation researcjiers will often know the 
resources available in the community. Finally, the 
adminis^trator may inquire locally to determine 
whe?her there* are evaluators ^ith experiencfe of 
the type needed. The administraicvr should not be 
reticient about asking to examine credentials and 
samples of previous evaluation reports. If neces- 
s^^ outsicle help, ^.g., from funding agencies, 
should be sought in assessing the credentials and 
previous work samples. No competent and honest 
evaluator will balk at having hfs or her work 
examined carefully. ( 

A good evaluation research team begin»with a' 
/highly competent evaluation researcher. That per- 
son will then, ordinarily, be able to put together 
the staff to the evaluation if it is funded. In the 
meantime that researcher should be quite willing 
to participate in planning the evaluation study and 
in preparation of the proposal to be sent to the 
funding agency. The greater the input from the 
potential research director, the stronger the pro- 
ppsal is likely to be and the greater the chances of 
the ultimate success of the evaluation. 



Evaluation of Experiments ii\IPolicing:' 

What are we Learning? 

^ " y . ^^^^^ 

Joseph H. Lewis 
'Director of Evaluation 
Policy Foundation 

Washington, • ' ' ' 



Over the past several years the Police Foundation has been fostering, supporting, momtonng, and publishing results'of a variety of 
research on the delivery of police services. During that time the Police Foundation has accumulated a valuable fund of information 
oBbutHluprobl^ in dqing police work and in getting it paid attention, to in the police community. While police work cannot be 
eqiiatedufHhWf delivery of emergency medical services, it is believed that there are Enough similarities between the two fields to make 
^t least some of the lessons learned from police work transferable. 



It has been a long time sjnce I have done any ' 
work^.but I have had the-opportunity to Learn 
from the labors of others. The last five years have 
been especially interesting. During that time the 
Police Foundation, in collaboration with ^ number 
of police agencie^s across the country, has initiated^ 
fifteen substantial pieces of evaluation research in 
the field of urban policing. Ten experiments are 
finished, three are in various stages of evaluation . 
report completion, and two are still running. / 

Some exp«iriments have been done by Poli^^ 
Foundation evaluatj|^n staff with support in some 
instances by cc^tract research institutions, many 
by research infetitutio^s under direct contract ta 
the FoundapOT. These numbers do hot sound im- 
pressive* compared to, say, the national debt, But 
they do, irVfact, constitute a respectable fraction of 
the evaluation research in regard to policing that 
can be termed consciously fotmal in the sense that 
it is intended to conform, as far as nature will al- 
Jow, to the rigorous, standards' of science. Since 
these are a class of social experiments we are'talk- 
irig aboMt, it will cpme as no surprise to you that 
semetim;^ the correspondence with scientific 
standard^ of rigor has not been as close as orte 
could wish?. But all ofi our work has been con<- 
ducted, reviewed and reported by those standards. 

Much that the Fourtdation does is of a differ- 
ent nature, related to rii^moval of barriers to im- 
^provement in personnel and other important as- 
pects of administration or to more direct efforts at 
reform through information exchange and the 
like, but all of the activities under direct discussion 
here were initiated with the firm intention of for- 
mal, experimentation. Each initiation has been the 
product of a negotiation between the Foundatioa 
ands^ police agency. Each negotiation began with 
exploration by a Foundation program officer with 
police administrators to search out j)ossible issues 
^ '"ommon interest which lie within the strategic 



pur^ses of tbe Foundation, in^policing situations 
that appear to lend themselves to productivi? re- 
search, j 

When an acceptable issue to tejst is found itfi a 
climate of circumstances that appears to fai^or 
formal experimentation, the program officer 
works with the pojice agency to help the agency to 
produce a proposal sufficiently concrete to enable 
our Board gf Directors to assess thejntrinsic worth 
of the idea, in terms of generating nationally, as 
well as locally, usable knowledge of substantial im- 
portance to inaproving policing^, and to consider 
the cost to develop a program' plan for the exper- 
iment and an evaluation design to go with it. This 
prelimln^y proposal will have had, at the very 
le%st, input and advice from me with reyject not 
only tp evaluation. design land planning needs, but 
*also about bringing the statement of program 
purpose and process toward measurable, concrete 
terms. Often, even at these very preliminary stages 
of program development tjjiere will have been 
more extensive evalua^tion staff .collaboration in 
specifying .what kind of e^eriment ii will be at- 
tempted lo design. / 

When the Board apprpves the planning grant 
and a sum for evaluation design, the police agency 
adds officer and other capacities — including civil- 
ian professional specialists as needed — lb the 
planning team which will develop the full experi- 
mental design and program of action. Evaluation 
capacity is mobilized to work in close conjunction 
with the planning teamlo produce the evaluation 
design and work plan so that the experiment and 
evaluation are parts of a single, coherent entity 
aimed at producing the defined knowledge speci- 
fied. 

Initial estimates o^the e^iperimental design 
task, of the capabilities of the police qr evaluation 
groups to perform, or both, may have been mista- 
ken. If the design and planning process- goes well 



1« 



116 



ERIC 



' but needs more tihie or other additional resources, 
extensions to as long as one year, on one or two 
occasions even looger, may be funded. If it s^iould 
become clear that a feasible design for.formal ex- 
perimentation and«^evaluation is not going to 
l?merge, no experiment will be funded. Should' 
another kind of research than an experiment still 
seem promising, 'a proposal for it,~ pr^eparecT 
through the full co'opei;ation- of the police and the . 
^esearchers, would be submitted to the Board. for 
consideration. ~ ^ 4 

A grant to a police agency to conduct an ex- 
periment or other form of research retjuires the 
agency to commit itself^ facilitatie collection, and 

'**in some cases to provide, baseline and other data 
pertinent to maintenance of the experiment and 
conduct of the'evaluation. It must also commit it- 
self to maintenance o^expenmental conditions for 
the planned duration of the experiment, barring 
catastrophe. Foundation program officers monitor 
and 'work with the project managefhent staffs of 
ttie police agencies in which they have experiments 
or i:)ther programs in progress to make sure that 
the agencies have the capacities needed to main- 
tain controlled experiments and are doing, so. 
Should that not be* the case^ every attempi would 
be made to assist the agency to do so. -If circum- 
stances did not allow for full success bilt the 

. aglfecy remained committed to the attempt, ad- 
justment of objectives might be made if substantial 
gams in knowledge could still be expected. Other- 

^ wise funding would be' subject to termination. 

/ • These, no doubt, simpfe appearing para- 
graphs compress a great deal of information about 
what we have learned about doing evaluation re- 
search in policing. It is the model we believe to be 
most useful in our*business. We have come close, 
much closer perhaps than most, to operating as I 
have described. Even wh^n we do, there are'seri- 
ous problems to deal with! . 

• Development and conduct of experimentation 
and evaluative research in these Fifteen instances 
"has provided rich experience in identifying some 
of them. Several of your speakers are participating 
in this conference' because Professor Sechrest be- 
lieves that some of our learnings from them may 
be transferable to research in the field of emer- 
gency medical'services. Our practitioners and re- 
searchers in that field can assess which ones may 
be applicable and to w*ha{^ degree that may be so. I 
shall not ^nyself attempt to draw many parallels. 
There are ptobably many reasons why I should 
xnot, but one seems sufficient: I don't know enough 
about ^rhergency medic3l services (EMS). ^ 

Let us begin to unravel some of these 
gert^ralities: Note first that 4II of the foregoing has 
been stated in terms of)the interests of a funding 
agency, one dedicated by the terms of its charter 
and commitment from the Ford Foundation in late 
1970, to'improyeiTient of policing in the United 
States. \ 



There are a number of reasons fop thiSf An 
obvious one* is that that i^ the perspective natural 
to tpy present, business.^ Another, however, of 
more direct interest for this discussiort, is tha^the 
funding experience can be a sort of integrative 
mechanism for learning. When we t^ike note ovfer 
time, for^example, of wh^t the most uVefu^ items 
are thaft our funds provide witfi respect to initiate 
ing or to sustaining an experiment*or an' evalua- 
tion, or to keeping them in adequate relation one 
to the other, we begin to understand which of' 
them seems special to one circumstance amd which 
are recurrent and mdre generat in application. It* 
is the fact of being a funding nexus that lets us 
learn the same thing across ^a variety of projects 
about the importance of what our program or our 
evaluation people do. Once we have itnderstood 
those observations, the findings that seem to be 
most general can be used by any agency that wants 
to test, in a formal sense, the usefulness of what it 
already does or innovations that might improve 
the agency's effectiveness. 

Finally.'^is perspective is suggestive of 
another important po^int.- When the Foupdation 
was first char^red, it was expected that a flood of 

'good ideas abotit things to try, e^^pressecl in terms 
of well thought out and specified proposals, would 

^poyr in from police agencies across the country. A 
flood did pour in at first, but in general, they were 
requests that the Foundation fund conventional 
training programs, or a new headquarters, or a. 
management survey or the like. .Those thatWe- 
ferred to a desii^ to try a new idea often showetl 
an un'awareness of what other agencies were doing 
or were not well thought out in terms of specified 
objectives, concrete steps to achieve them or meas- 
ures of success. In short,' it quickly 1>ecame clear^ 
even to xho^e 'of us who didnot already#cnow it, 
that the J'oundation was never likely to be able* 
simply to hand a check to a police department and 
stand back to wait for^he inevitablf good results. 

The problem for the police is that they are 
fragmented into some 17,000 forces,* each ah is- 
land unto itself. They cai^be islands in two senses 
importanib to this discussio/i. They have tended of- 
ten, as you probably know, to feel defensively iso- 
lated from the communities they serve. In cities 
where our sufVeys have shown,- as they irtvariably 
do, that citizens have a high regard for ttie polite 
and are supportive, the police tend to underrate 

•that regard and support. There is an aura of sec- 
recy about what the police do and how they go 

.about it. 

But, for our purposes, almost inore important 
is the fact that police agencies are, generally, insu- 
lar with respect to each other. Almost all of oiir 
nearly half million police s^rve their whole care;ers 
in the agency they first'join. lateral movement ex- 
cept at the highest levels is almost non-existentand 
is rare even at the level of chief. Communication 
among thenr about the substance and methods of 



122 



their work is generally poor. In the spring*of 1974, 
the Foundation convened k conference -of the 
chiefs of f>ktroI of the forces in the 35 largest cities 
in the country. That is the first time they had ever 
mel^ 

Th^e factors seem to have had consequences^^ 
of the following kind. It is rare for police adminis-. 
trators to be formally trained in management, as 
city managers musf be, or in business manage- i 
ment. It is rare Tor police agencies to employ the 
many pro/essiongil or technical -skills from ''o\ 
, side," as many other fca^ms of enterprise that deal 
with organizational management and human serv- ' 
ice issues frnd it ftaturalHo d"o7^anagement prac- 
tices common to many other fc>rms of enterprise 
• are slow to be adopted in policing. State-of-the-art 
knowledge or breadth of experience with prob- 
lems and practices across differitig jurisdictions is* 
hard to come by in such a setting. ^ 

This is why the money, the Polil^Foundation^ 
provides in planning grants goes largely for two 
things: '*outside'' consultants and travel. 

Over the last few years we have helped several 
police agencies learn how to use psychologists, 
sociologists, program analysts,' data technicians, 
pdrsonnel specialists, organizational development 
specialists, and others with talents and specialties 
from outside the world of policing. Ii has been 
necessary to do so to he\\> police administrators 
formulate in concrete terms the ideas they w^nt to 
join with us in testing, to help them learn what else - 
' IS known that is related to it, to help them select ^ 
the most promising ways by which to test their 
ideas, affd how to make those tests acceptable, with 
meaniljg to patrol op otb^r officers, as well as to' ^ 
the citizens^ wbo are affected by the test or who 
. may be by the results. * , 

^ Travel budgets for other than the chief are 



sYnalf or non-existent in many departm^rrfs! Even 
the chief may be restricted to one or twb^tnps per 
year. Travd is often the first itera to be cut jn 
tightened city budgets. A cutter simply has to say 
**boondogle," and wield the axe. 

The Foundation has sponsored travel, by offi- 
cers at all levels, to other cities that have dealt in 
some way fen issue area* they wis-h to explore 
that will Ijelp them jp^their pl^nning^ 

Son^have said ftiat providing these two kinds 
;^of aid topolice a^ncies, helping them to opei^up 
to a broafler world, both ofj?olicing a^nd of the still 
. widc^^^nfe beyond, may1)e among th^ most useful 
thingS^fcJfoundation does. I would nqt deny that 
. possibiluy. h is, at any rate, cfear that we could-ttpt 
design and plan good research* with our p^^e 
' partnerj withou^them. i . - 

• Does any part of this sound familjar to you ^s 
EMS practitioners and researchers?* 

LeVus,move oji rtow from what we have 
learned about what it takes to help a willing police-^ 
. agencj design and pd^n good research torwhat vvf^ 



learned about what it takes to execute a good 



research design to produce cre^Jible answers about 
what works or, what does not. To lay the ground 
work, consider what we need t>6r<leal with. 

Evaluation of the consequences of experini^n- 
tation requir^, ideally, .commonly accepted, well 
defined measures of input and output. Measuring 
the pcu^orma'nce of police fequir^s* agreement 
about /the dBJ^ctives of policing, what the police 
are supposed to deal with, how they are supposed 
to^ehave, and what they are supposed to accom- 
plish, all in m^sura^le terms and based upon data 
that it is feasible to get. It is common knowledge 
that me'asurement of public sector activities is^gen- 
^ erally far more difficult thai\ for^biifsiness where 
dollar gains and losses are comparatively easy 
yardsticks to apply. Poji^ing provides an excellent 
illustration of the complexities of mea^surement. in 
the public sector. 

Let us trace that idea for a moment. One ori- 
gin of the problem is that there generally is not 
one public which decides and transmits through 
city management what it vyants the police to do; 
there are several and they are often in sharp dis- 
agreement. Field interrogation, stopping and 
questioning citizens, can be proper order main^te- 
nance to some middle class blacks or whites arIB, at 
the Sjame time, harassment to youngsters with long 
hair or bushy afros. Some want arid need emer- 
gency nelping services, from transportation to 
medical service, to counseling about domestic 
trouble, to solvir\g neighborRdod disputes, to deal- 
ing with an insane rel^ktive or frienS. Others in the 
same cifty would turn to their doctor,, their mar- 
triage counselor, fheil" lawyer, or their psycl\iatrist 
, for these sorts of service, believing firnily that the 
police should **stick *to.cnme*\or **solve the traffSf- 
pjoblem'* and not be diverted by these, as they 
would term them, extraneous, unproductive de- 
mands on their time. And so it goes. 

For any particufcir remedy the police might 
apply, there will be disagreement about its use. Is 
an krrest'the best solution to a problem? People 
differ. It is almjf)st automatic for m^ny in and out 
* of policing tatltink of good policing»as aggressive 
policing and to think of high arres^t ^ates as indi- 
k:ator5^of good, aggressive policing. But for several 
years, *many have thought not for some kinds of 
behavior the police most often deal with and have 
-tried .to divert young offenders,\or drunks, or 
others away from, the law enforcement system, .or 
they have wanted to teach police to counsel police 
in domestic disputes, partly so as to avoid iri;rests 
whenever possible. Some believe, the pohje>Sllol[ld 
be cool and inapersonal, others, warn},»frTendl](,^fc- 
tere«ted. ^"'^Y* " - 

' ' What this means for research;* rifli'djjM^t iskthat^ 
no single me^nsure of performance ^or*'6jatcori<|^ill 
suffice.- As many aspect^^s poss^WMj^jj^ tn.ea^ 
ured and the resu]ts l|j^d ,ain'»BO t^t^iny p<^ce or 
public reader may apfjjy .^liirdw^ jelStrv^ weights 
or values to them.. ' * . * 



3 



Another perspective that helps to uoderstan(^ 
why evaluation oif experiments or assessnients of 
police performance or of effectiveness are com- 
plex and difficult stems frpm reicognition that little 
is FiFtnly known about cause*, and effect relation- 
ships in dealing.with crilhe, little theory exists that 
explains how or why what the police do ought* to 
affect crime. Only a tiny Beginning has been made. 
Two examples will help to make the jx^t. It has 
been asstmied^s\a rul^ by many, in and out of 
policing, that one-third to one-half of the time of 
police officers assigned to street duty mu^t be 
spent routinely patrolling the streets to prevent 
crime; insure? citizen satisfaction ^ith the police 
and redude their fear of crime. Our experiment in 
partnership' with the Kansas C^ty Police Depart- 
ment ^ suggested that quite wide variations in 
routine preventive patrol, keeping everything else 
constant, had no effect on crime, satisfaction, or 
fear that we could find. Anoth'er Kansas City ex- 
periment ^ that the Law Enforcement Assistance 
Administration is funding is beginning to suggest 
that, in many instances of even serious crime like 
street robbery, citiz ms wait so long before they call 
the police that it does not matter whether the 
police hurry or noi as far as opportunities for on- 
the-spot arrestrare concerHed. And yet both 
police and pu^blic have always felt sure that shon 
response times were good for that. In fact, short 
response times are often used, by themselves, as 
indications of a good police force. And police 
Ynanagers, coach their publics to expect short re- 
s^nse times to all kinds of calls and ihey spend 
substantial' resources on radios and cars, man- 
p?6^er and computers to make them short, an ex- 
pensive proposition. . e 

What this say»is that there is n^fyet much fal- , 
idated, codified knowledge and ihat much^pf ^^hat 
we think we "khow" is not tr;^. Clearly, th^j^in 
the field of policing it is important to test the con- 
ventional- wisdom as well H.S to try out new ideas. 
We must expect our lack of knowledge to compli- 
catte our research designs and to increase the risk ^ 
of failure for unexpected. reasons. 

The effects we are looking for are often subtle 
or modest in size. The measurenx^nt tools so far 
developed are not always very sharp. Many believe 
that,- to some unknown degree^ much criminal be- 
havior stems^rOm economic. and social conditions. 
Young people are being arrested for a large and 
growing amount of it, up to half in many places.* 
The police cannot keep people from being young, 
or poor, qr male, or.black. What police can do can 
affect son^e kinds- of criminal ^^havior some of the 
time in some places. When we try to usf the 
amount^ofcrime report^ to the police^.and that 



' George L KcHing. Tony PitA Duane Diecktaan arg Ch*rtea £ Brown, Tki 
K/tmrnt Ctfj Prrvtnttm Ptroi Fxpftimt^i (^mmary R^n. 1974^ Technical Report. 
1975). Police Fou/^dauon 

* Deborah K Bertram and Alcxaltider V«qp. "lUtponte Time Arulym Study. 
Preliminary Findmp on Robbery in Kanaai Cm^" Tkt Point Ckuf, Ma^r 1976 ' 



the police include in their records, to determine 
whether crime is changing, we run the risk that 
any changes we may see may be caused by differ- 
ences in what "people choose to report to the 
police. They may also be caused by changes in the 
way the police treat the reports coming in. These 
problems can •be guarded against for certain kinds 
of crime by conducting ^ictiinization surveys of 
citizens. Data from such surveys do not have police , 
bias in them buf surveys have some aroblems of 
their own/What looking for modfei^ffects with 
imperfect measurement instruments demands is 
measurement of any given effect- from as many 
perspectives as possible. Such multiple perspec- 
tives when applied to a sizable number of outcome 
measures can give confidence about what did or 
did not happen even though, taken singly, most 
jneasures would be too weak to do so. ^ 

But it is not impossible to bypass all of these 
complications *by noting that, since the business of 
<he police is to provide service^ CO the public, direct^ 
measured of citizen s^tisfection with polite service 
would be the ultimate indicator of success or fail- 
, ure? Unfortunately this is not now a real possibil- 
- ity. If the lack of hard knowledge and the other 
complicatioA we have mentioned are linked back 
to tJfe earlier point about insularity of police with 
respect to thei> public and the secrecy that sur- 
rounds whaft they do and how they do it, the result 
is that citizens have little or no* basis for knowing 
what it- is reasonable to expect their police to ac- 
complish or how to judge whether how they go 
about it is productive or wasteful. This denies 
evaluators the straightforward use of indicators of 
citizen satisfaction as a measure of adequacy of 
police performance or effectiveness. 
^ What have we learned about conducting re- 
search, ex1>erimentation and evaluation in^ch an 
environment in partnership with police agencies? 
*Let us go back to the condensed summation with 
which we began to see what those simple looking 
statements mean in practice. 

We said that a police agency that wa«ts toptest 
an idea must conimit itself to facilitate^^collec^on, 
and in some cas^s to provide, baseline and other 
data pertinent to maintenance of the experinftent 
and conduct of the evaluation. The importance o|^ 
baseline data, tliat is, data that shows what condi- 
tions apri>efore a contemplated change is begun, is 
pretty clear. Without it, it would not be possible to 
make serious, befcjre and after comparisons to 
show whether any cha^lge took place when a new 
technique or other change was tried. But what 
njany administrators whose experience has been 
concentrated Qn o|>erations, making things hap- 
pen, are not prepa^jjf^for is that collecting su^h 
data can be a massive, time-consuoHng affair. 
Commonly, it.has be^n their Experience that it is 
difficult to gear yp their or^aniiation Uar generate 
support for change or innovation, *pi<LO challenge 
accepted wisdom. We will come back to this point 



/acc 



in a moment. Once that enthusiasjn has^beert gen- 
erated, it is natural to want to acf^Sefore it dissi? 
pates. What has to be done in practice is to incor- 
porate that rasehn? data collection process as an 
integ^ part of the agency's preparations for the 
experiment. It is easier to do so if the issue to be 
addressed by each test is as concrete as possible. 
The «|easurement complications and lack 
theoretical-knowledge of policing to which weji^ve 
previously called attention also suggest this^urse. 

The process of bringing ai^ organisation to the 
pitch of enthusiasm often generated to facilitate 
launching and support for maintaining an exf)er- 
iment or other kind of innovation in policing can 
result in a state of overpromise leading to sub- 
sequent disillusion.Tt is Something like the politics 
of congressional legislation, so much has to bev ^ 
promised to secure passage that any action bill is \ 
almost automatically doomed to bejeen as a fail- 
ure when it is implemented. We noted earlier that 
most of the effects the police can produce by 
changing what they do-are expected to be modest 
in size Overpromising eas)>-4t5illusionment — 
both of officers and of th^ public — is frequent and 
makes further change more difficuh. The shrew- 
dest chiefs have learned to focus on the trying of 
better ideas or the testing of old ones to make im- 
provement rather than on expectations of 
eliminating crime or citizer^ fear bf ^ny single 
thing, however major, their department^, by them- 
selves, can do This is a hard-learned but valuable 
lesson Cor other managers of service systems. 

We also said earlier in our initial summation 
' that a cooperating policy' agency commits itielf to 
* maintenance of experiWntal conditions for the. 
planned duration cyf the experiment, barring 
catastrophe. Let us deal with catastrophe a little 
^ later. Experiments do not Tnaintain fhemselves. By 
. definition, they constitute the maintenance of 
strange conditiort^. Organizations have enormous 
capacities for alkorbmg attempted change so that 
when one looks again, all is as it was before. There 
are many reasons for this- Practitioners may be- 
lieve that the way they normally do their wgrk is 
best"; they may feel that a change to be tested risks 
the §afety of their beat: individuals may fear a loss 
,of relative power or prestige, or even pay. Collec- 
tively the effect is similar to inertia, organiza- 
tion tends to keep on doing whatever it has been 
doing in the same way it always has unless an in- 
side or^utside foi^e as brought tq be^f to change 
it. ' * . 

To' be serious about research that requires ex^ 
perimental conditions to be set Up means that the 
police adminisw*ation needs to «lecfde in advance 
how it will know that those conditions are in being 
and to set up explicit means-^at? or Indicators to 
watch and people to do it — for continuously pr 
periodically monitoring whether they are. 'Such a 
monitoring capacity must be able taTeecT informa- 
timi to the boss as to whax is off the track ancl what 

ERIC " • iPcr 



change will restore it. It is then up to the boss to 
take the necessary action to do so. If some police 
Activity is to be stopped in defined areas, is it stop- 
ped? Does it remain so? If an activity, or the 
number of officers i^ td^ be increased at certain 
times or in certain areas, is that happening? If two 
kinds of officers, say male and female, are to be 
assigned to tasks equally, in this case without re- 
gard to sex of officers, is that being done, or are 
men subtly protecting women? . 

In practically every case, the cooperating 
police agency has required the continued internal 
assistance of some of the same Junds of consultants 
that were^i;ovided to help witn the mitial design, 
-and plajining of the research. To these have been 
addled police managem^t and operational talent ^ 
which together form a program management 
group to rDn the research program on behalf of 
the agency. 

Often, and what the Foundation especially 
likes to- see, the city government, at the recom- 
mendation of the' police administration has created 
the necessary budgeted positions to institutionalize , 
the civilian additions to the police agency's capac^ 
ity to plan and manage research after the first year 
or so of Foundation suppon. Such bodies often as- 
sume wider pr5nning^ajM^rarU| and research 
. management capacities thgiwar||f tne ageticy's abil- 
ity to innovate and test what it does well beyond 
the initial levels the Foundation has sponsored. 
The Kansas City response tifne sttudy'was de- 
signed, funded and conducted, including ihe prcv 
ently oopoing analysis of results, througlrthe ef- 
fart^^rthe research capacity originally established 
in the course of Police Ft)undation experimenta- 
tion in that department. 

We had said that Foundation program officers 
monitor and work with project jnanagement staffs 
to make sure that the agencies have the capacities 
needed to maintain controlled experiments and 
are doing so. The energy and attention of our 
program officers have often been as important as 
our funding in secuj-ing th^ Successful completion 
of research. When the indicators show that some 
condition is not being maintained as agreed, it may 
be tFfet a shift of existing prograni resources will 
help to get it back on track. A staff visit to another 
department where a similar problem has been 
solved may help the agency's project management 
more than additional computer time that may be 
budgeted. Or a computer specialist may be able to 
solve a programming problem to help get better 
data for controllipg the experiment. Flexibility in 
shifting experimental program resources has often 
helped to 'make the most of research opportuni- 
ties. 

The police agency's own monitoring "process is 
designed during the early planning phase that we . 
h-aA'e talked about when the experimental and 
evaluation designs are being worked out together. 
The evaluation team works with the agency's proj- 



19 



c 

ecl managemenl staff and helps to specify what 
indicators wiH sh^jy whether the experiment is on 
'tfacl and as^sts in designing the data collection 

^.scheme that^wUl produce those indicators. Once 
the experiment running, the evaluators monitor 
the quality ofthl indicators and help the agency ,tp 
improve the qualily^liere it is not adequate for 
the purpose. In every case so far that has been 
necessary. One reason is that data adequate for 
every day familiar o'perations are often not suffi- 
cient for doing research or trying out new \^^ays to 
do things; the level of detail may be too low or not 
all the 4;inds of data heeded may be routinely cdl- 
lected. Another is that many police agencies are in 
some state of transition in their use of computers. 
Thi^ means that, even though the computer is 
producing d^ta about an of>eration, the operation 
may still be being managed and run by the pre- 
existing^ethod of control. In such cases, errors^n 
the computer data may not be noticed. In any case, 
th^y do nolgriatterr When, for example^dherence 
tcT dispatch discipline in a team policing experi- 
ment forces use of computerized dispatch data, er- 
rors in the^ata suddejjly make a difference. Be- 

'^fore that, no one knew that there were any. 

The four-way feedback betweeti police agency 
program qian age ment and evaluators in the field, 
between police and, Foundation program officers, 
'between evaluators*in the field and evaluation^ 
management and, finally, between Foundation 
program and evaluation nlanagement, hks been 

/ responsible, at its best, for getting the rtiost out ^f 
a research opportunity to help a police agency gain 
knowledge about a question important to its own 
purposes, as well as to policing nationally.. Whin 
.communications in one or more of the links has 
been incomplete or slow, results have tended to be 
less satisfactory. This may happen because the 
caji^cityJbr behavior of the police agency or evalu- 
ation staff courd not be adjusted rapidly enough. 

When circumstances b^ond control prevent 
realization of initial expectations for arr experi- 
ment, it is sometimes true that le&s ambitious but 
still valuable research objectives can be reached if 
the facts d^wearned soon enough that police 
agency and ^bundation management, both pro- 
gram and evaluati^, can agree on the changed re- 
search specification. If events preclude that, itJs 
still essential that thfese feedback loop§, especially 
from evaluation staff, operate so as to make clear 
to all concerned how a given state of affairs diffefs 
from what was planned. For example, il can hap- 
pen, as it can in most public or private b«reauc- 
racies,*that a prime scKirce of inertia or resistance 
to change is mid,dle management. A decentraliza- 
tion plan, perhaps such as neighborhood team 
policing, when implemented, will shift operational 
decision, making authority downward away from 
mijddle management. If other aspects of the 
change in organization and operations do not 
ccjm[>ensate for that in ways perceived as adequate 



by middle manageme^, members of that group 
may well rtsist the maintenance of the new ar- 
rangement ^'that, in a few -months or a year or 
two, authority they deem important will become 
re-centralized and the planned change will really 
not exist except, perhaps, for superficial , appear- 
ances. Should su(;h"2ustatexof affairs be detected, it 
would be important for a chief to know as soon as 
possible so that he could decide whether he has the 
political power, internal ^nd ejctemal to his 
agehcy, i6 deal with the situation. (We* will come 
back to this point again a little later.) If circum- 
stances change, it is important for all concerned to 
know that the evaluation report will say that. 

By now we have seen that, in all cases, operating 
agencies ^ave added new capacities to themselves 
to enable them to plan and cpnduct serious re- 
search. The sorts of capability adequate for of>erat- 
ing as usual are not adequate foi* an agency that 
feally wants^to advance its knTb^edge* of , and to 
improve, its own art and practice. The sorts of ad- 
ditional talent that are needed do not ordinarily 
grow in police agencies so they, must be Brought in 
fpom outside where they do, from universities and 
V research groups, from technical and professional 
schools, from other backgrounds and experiences. 
'When this has happened, not only has the, agency 
been able to conduct research and tests that it 
wanted to do, but also, it has been «ble to improve 
its knowledge and control, for management and 
operational use, of its data and information sys- 
tems; it can analyze its own internal operations; it 
has been able to plan, secure fundin/ for, and' exe- 
cute additional research apd test ur other im-^ 
prov^ent projects on its own. Most importantly, 
the viewpoint of the agency can change to one of 
open questioning, of what it and other agy^cies do 
and how they do it, making learning from experi- 
e.nce a continuous, explicit process, and innovation 
and change based upon such learning, natural. 
This is a ^harply different atmosphere from the 
isolated, defensive, rigid climate which h^s per- 
vaded agencies that have'not moved. 

Adding such capacities, even only one or two 
people bringing new kinds of talent and training 
not "slotted" in the organization, costs money. 
Sometimes part of the of>erating force or of man- 
agement that is to participate in an experiment or 
other research need to be sf>ecially traiqed. That 
costs money, sometimes at overtime pay rates for 
laTge numbers of officers, plus the cost of instruc- 
tion. Sometimes^ditional or special equipment is 
needed (althoijfgh the Police Foundation has dried 
to k^ep its contribution to equipment at a 
minimum), a^d that costs money. City or county^ 
councils do not, even in relatively good' times, 
readily make money available for research and ex- 
perimentation; they prefer'to fund only traditional 
or tested items. If it were not for that, police or 
other ageYicie^ could go ahead and add >vhatever 
abilities are needed and do their own research and 

12G 



testing of wl\aj they do or of new ideas. As it is, 
with rare exceptions, outside furlding sources ftiust* 
always pay the bills for. initiating test and innova- 
tion. And another need for outside funding is- to 
makc^^ujtion credible. ^ 

It may seem strange that we have come this far 
in discussing' evaluative research in policing with 
only cursory mention of evaluation. We h^ve said 
that evafuation and program designs nl«st be de- 
veloped and planned together as parts of a cohe- 
rent whole; that evaluation staffs help police 
agency project management staffs to design and 
test internal project monitoring and evaltfation 
plans and data system^ for them; that evaluation 
staffs monitor these mo^iitoring systems and indej^*' 
pend^ndy assess. the state of maintenance of ex/- 
periraental cohditilons. We have said that 
evaluators provide cruyally important feedback 
about that>'to the agency and to the funding 
sodrce, to both program and evaluation manage- ' 
ment^. But. that is all. 

One reason we have not said more is that 
other speakers at this Conference have already^ 
^done $o. But the most important reason is that we 

, 'are dealing wj^th fifrst things first. An ageney chief 
and administration that realfy wants to test an 
idea, MS fully committed to maintaining agreed 

, upon experimental conditions for the duration of 
the test, has^the capacity to design and plan a good • 
experiment and the ability to monitdr and to take 
whatever action is reqi^ired to maintain it, can 
make the evaluation task, inherently difficult at 
best, worth trying. If the agency chief and his ad- 
ministrators, either through lacl^ of interest or im- 
patience, lack of understanding of the commit- 
ment they havi? n^ade and what it may require 
them to do, or for anv other reason, do not main- 
tain the experimental conditions, the planned 
evaluation is impossible and no amounfof evalua- 
Uon talent can make it otherwise. So we have con- 
centrated here on what service practitioners need 
to do |o make experimentation and evaluation 
> feasible ' 

Given that the c(thditions for research and ex- 
perimentatipn leading to opportunities for good ' 
evaluative research have been established in an 
agency, why should it not go ahead and do its own 
evaluations? For many purf)oses it sho.uld. This 
will be particularly true for tracing of internal 
operating processes and attempts to change- them 
* and for some experiments which can be evaluated 
at relatively low cost. An ability. to do so will not 
only enhance the ability of such an agency to do its 
own work better?mit will make it a much smarter 
customer for outside research it may wish to con-» 
tract for — a point of no small importance when 
one 1^ aware of how vulnerable most agencies are 
to the purveyors of 'outside '.^expertise'* and how 
little unsophisticated agencies benefit from such 
^''services. 

* BtHvfoctors work against the agency doing^ils' 

erJc 



own evaluations in many important circumstancejr 
One is thatif tWb agency wishes to make a substan- ' 
ttaK<;ontri^Mt*of^ better understanding of a 
police iSue that has national inipprtanct, it is es- 
sential'that iht evaluation of results of^Jan experi- 
ment done for thai purpose be, and, most impor- 
tantly, be seen to be, disinterested. A separately 
funded, independently managed eval»ua,tion staff' 
to measure impact of the conventional wisdom or ' 
new technique or operation being tested is es^fen- 
tial to credibilirj^Tthough even that does npt neces- 
, sarily assure it. That is why, in ajl experiments 
sponsored by the. Foundation, Jme evaluation's 
funded by our Board in a budget entirely separate • 
and distinct from the budget for the program to be 
evaluated; the evaluation capacity, whether inter- 
nal to the Foundation or contracted for, is man- 
aged and direct>ed entirely separate from pro- 
gram management, and both designs and draft 
evaluation reports arexxtensivdy reviewed by an 
outside Evaluatio/i Advisory Group, members of 
which have no vested interest in tht success or 
^failure of aprogifem or of a police agency. A more 
, complete separation would occur if the Founda- 
^tK>n sponsored the evaluation of an experimental 
program funded by others. This has happened but 
is rare,, partly because so few experimental pro- 
grams well enough controlled to be worth formal 
evaluation are being funded of are occurring nat- 
urally, pardy "because others who fund programs, /- 
not unnaturally, want U3 reap t)ie f>otential bene- 
fits- which may come \vith publishing reports of 
good outcomes. Since experience has taught us 
the, literally, cruciaF imppftance of program 
monitoring arid control of experirrtental condi-* * 
tions, the separation of program and evaluation * 
aianagement but still within the Foutldatv!^ rubric ' 
has seemed to us so far a most useful compromise 
between assurance of as high quality research as 
the situation may allow and the high external cred- 
I'bility of results 

^/^he other reason why evaluation of experi- 
mental impact mu'st^most often be external'is cost. 
It IS not unusual that baseline data that must be ' 
collected even before it can be known that the ex- 
periment will run successfully, or even* for^sure 
that it will start, can easily cost^S 100,000. A com- 
pleted evaluation of a major experiment, such as 
the Kansas City Preventive Patrol Experiment, 
conducted by Dr. George Kelling and the^Polic/ 
Foundation Kansas City Evaluation Staff, with * 
technical support from" Midwest Research Insjli- 
tute, may cost $650,000 to $700,000. The five- 
year, from start to design to publication of report-. 
Urban Institute evaluation of the Cincinnati 
neighborhood feam policing project known as 
ComSec will have cost' wtll over $1 miUioiTwhen it 
is completed, this despite the effective eTforts of 
Alfred Schwartz^who managed it, to keep the costs 
as low as possible. Such costs come about through 
the inherent difficulty of answering tfie questions 



we are attempting to deal with, however simple 
they raay'so^nd, in 'the face of the complexities 
about measurement in policing to which we al- 
luded earlier and with, the rather blunt ovfr . 
^disposal. In order to say whether sex is a bona fide 
basis for exclusion of .policewomen from patrol, itf 
was "simply" necessary to determint whether some 
women could perform as well on patrol as accept- 
able male officers. Giv"fen the disagreements about 
what p^^rol dfficors should do, .how they sho\i|d 
behave fend what they should be able to deal with, 
it wa^necessary for Peter filoch, in directing The 
•Urban Institute evaluation of policewomen on pa- 
troljn the District of Columbia, lo report in th^ 
summary findings three measure of comparability 
of assignnyent, 23 me^asure^of performance, three* 
of dti^^eiKatti^udes and 13 of police attitudes. This 
experim^' took two years and cost over $300,000. 

. Few police agencies ever^ave these levels of 
fupding free*t)f operational commitment. For 
major eyaluatimis; outside funding is almost al- 
ways a necessity. 

We hay^ set out in simple terms wh/at an 
agency needs to do to participate effectively in 
evaluative r.esearch. Bpt why should they? 

It is common for administrators of all kinds to 
believe that evaluations of programs they 'direct 
are threatening, xhat sucK evaluations may ca§t ^ 
thm irr bad light if the results are n^Ukpositiye, not^ 
jusjt the program. Police chiefs or oth^Kpolice ad- 
ministr-ators afe no exception to this tendency. 

Not on^y thai, but there is positive, political 
potential in ^arts that have no endings. The value 
and powe^ of starts must nol be .underrated. Any 
stiJ% of, experienced specialist^ in' bureaucratic 
survival is likely fo show that they'understand and 
ixvake full use of this principle — that starts of new 
projects, new Contracts, ^(^most 'anything — can be 
announced with f^faref can be made to seem im- 
portant jind good simply by rhetoric, and can lead 
to gains in image, all at relatively little cost since 
they are often paid for with outsicte^money. End- 
irfgs can Joo often be, at best, modest as compared 
tQ opening rhetoric, at worst, downright, damag* 
ing. The thin^ to do n to start as often as possible, 
' let thie project disappear qiiietl^^when that money 
is gone, and bury the disappearance even more 
, deeply by, ne\y starts. Until recently \h\s has 
worked weH for any administrator who chose, or 
unwittingly found himself in, this cycle. Now some 
law enforcement outside funding i^ tied to evalua- 
tion commitments and some of these will be im- 
plemented. But the relative power of the start is 
still a force to reckon with. It does not invite . 
evaluation. * 

Not only'that, but some police administrators 
who begin well deigned, purposeful research in 
good faith, on matters that they intfend to result irt 
real change, responsive to* the knowledge they 
hope to gain, can ht disappointed part way 
.through the process. It is natural forK))>erationally 



oriented people, li^e police chiefs, to want to 
move; (hey liv-e on Short time scales, wheft palpa- 
ble action* counts. Sometimes- they get impatierti 
with cvaluato^ who do not know what the results 
of an experinrent show as soon as the last data are 
collected. It' may take as long as a year to analyze 
and synthesize the vast quantities of data involved 
in major evaluations. In the riieantinle, the chief 
may feel there is a real cost to waiting. It can hap- 
pen that he has unreal expectations of^the Jcnowl- 
t^dge analysts h^ve and the use they cstti make of it. 
\ He may ngt know that, with rare exceptions, oper- 
ational judgments about "what l^appened" in ah 
experiment are 'still best made by his own opera- 
tional staff, not by analysts despite the piles of raw 
' data they may have. Their contributions to empiri- 
cal knowledgCfCorne from their ability to analyze 
and ultimately to«understand the meaning of com- 
plex data sets. Eva|i^rs, for their part, may feel 
sympathy for the^ chiefs sense of need and try to 
. give interim irfflioHidris earlier than they find their 
knowledge pf th^^ckfi allows. This situation is a^ 
potential source otlrritation to both police agency 
and evaluation stafts. Good feedback loops and 
patience are needed to avoid or correct unreal ex- 
pectations of each other by these^wo very differ- 
ent kinds of, people. 

Not only tmit, but we have said that the chief 
must be prepared when he undertakes to conduct 
an experiment to discipline people in his own 
agency if they do not support or if they, interfere 
.with maintaining necessary experimental condi- 
tions. People have been removed from positions or 
reassigned. The internal political costs to do that 
carihQ^high.- 

Not only that, whilie no one would ej^pect ex- 
perimencal conditions to be maintained that con- 
sciously jeopardized -the safety of citi^ei^, and it is 
understood in every case that a chief will stop ah 
experiment in which the evidence shows that that 
is taking place, nevertheless the chief is taking 
risks when he starts an experiment. He risks losing 
public support of citizens who do not understand 
what he is doing to assure no significant change in 
their safety during 'an experiment. He fnay feel 
t.hat he may risk losing support of his city njan- 
agement if results are not favorable. These risks 
are real.-^The average tenure of police chiefs in this 
• countr;y.is only about three years. Survival is his 
main preoccupation, and he well knows the whim- 
sical natute of the determinants of his tenure: a 
replaced mayor or manager or one breaking scan- 
daJ which catches him ^^ith surprise can overbal- 
ance precious years of satisfactory performance: 

What are the inducements to accept these risks 
and challenges? Why is it that police agencies have 
attempted as many as four formal^bxperiments at 
once? (That, we learned, is three tQp many even 
for a department with more management capacity 
than most. The concentration of attention at the 
highest level to jnSHTe that one major experiment 



''can be controlled, alon^ with runnj^ng the depart 
ment.orr a day-to-day basis — no small job ii 
itself — dictates attempting only one major experi 
^ent at a time.) The forces that lead to doing so 
must be powerful. 

Th6re are environmental one5. The public, 
the Federal governi:^ppit,Vommuntt]^ groups, and 
.scholars, have been applying pressure for im- 
provements in pohce civility and effectiveness 'for 
about a decade. When a city council tells a chief to 
show the e/Tectiveness of a practice that has be- 
coine controversial or abandon it, the chief can be- 
come moj^ receptive to formal testing. In that 
process, dements of his department can see and 
seize upon the opportunity to plan, secure his ap- 
proval, get financial supp||^ for, and test a differ- 
ent concept of policing^ furthei^rejponsive to the 
city coupciPs concerns, which changes the role of a 

' patrol officer. In three years, looking back, John 
Boydstun qC System Develppment Corporation di. . 
rected ev3luat!ons*of both the ^an Diego Field In- 
terrogation and' Community Profile experimejjVs 
and the department is now engaged with us in a 
most complex and difficult experiment to attempt 
to measure the relative desirability of one- versus 
two-officer staffing of patrol cars. The department 
has committed itself to and is engaged in adopting 
Commlinity-onented Polijcin^ throughout its pa- 
trol force. Tlje chief and the department are lopk- 
ing ahead to planning more tests of patrol prac- 
tice. ^ 

What T>egan largeK as a response to environ- 
mental pressure is. now a<n accepted mode of work- 
ing This has happened in either police agencies 
t^o, because there are manv in p'olicmg, chiefs and 
,others, who leel stronghvhe need to learn ahd 
change an^i will respond to opportuniU The 
Foundation sometimes represents such an oppor- 
tunit\. So, internal f drees can Sl^ be strong 

Foundation funding is another reason Bnng- 
ing m external funding c'an have political value in 
Itself But, in most instances. Foundation program * 
grants atq small compared to the police budgets 
thev might bethought to influence. Foundation 
funding certainly has facilitated th^ thoughtful, 
testing of rdeas bv those police agencies that wish 
to do so. but, by itself, could not do more than» 
that. ExpeTiditure ofSSO nxilhon on police re- 
search and reform over, a period of ^ome eight 
years cannot be expected to force the changing of 
an enterprise that will have spent, perhaps, well 
ovpr $50 bullions or more over that time span 

But change, and research and experimentation 
in policing IS going on increasingly. A principal 
reason seenxs to be that. many leaders in policing 
have concluded that this is the dlstinguishirtg mark 
of' leadership — to be open, to query, to test in a 
formal sense and then apply what- is learned and 
move fo'rward by such reasoned steps Others, who 
wish to be seen as leaders in their own right, are 
^ding that this is the way to dp so credibly They 




are joining, forces wMth the earlier irfli|)vatQrs. TJiis 
is the basis of.th^ strength thaiyis ^ow showing, 
Respite how much more nee^s doing. 

One caution is due to those who would follow 
'In this excellent path. The definition of success 
"Nnu«t be fully understood, h is customary for al- 
most/any* administrator %or program m^fl^ger; in- 
cludirtg those in policing, pnce lieTias decided 
what to do^'to commit himself to the success of the 
program or practice. He c6mmonly dees so in such 
a vs^ay that if it fails, he fails.*Hence his uneasiness 
about evalOation. Th^ leading innovators' ap- 
proach.i^^ferent. They fooft their attention on 
the problem to be dealt with^ and they c^ifRmit 
themselves to a fair test of the most effecuWsjp- 
proach they can devise or Tind at the time If the 
test shows thai not to be effective, they make 
changes or ^a-pply another technique or pra^crice 
< and test again. They do nqt fail when a program > 
' .does not operate or deliver as expected. They onJy 
fail if they jdo not try another approach irfiproved 
b^fcwbat thev learned m the test. 

An evaluator nr>easures .the success of an ex- 
periment, not in terms of whether the outcomes 
were as expected or ?ioped for by the agency, but 
rather, m terms of whether he knows what hap- 
. pened. (This difference can lead to friction.) The 
onl) failure an experiment (\an have is not to 
know The leading innovators in policing have 
adopted some of that philosophy, innovators in 
other kinds of enterprise ma\ find it useful 



123. 



ERIC 



129 



^ > EMogcaphical Sketches 

JairActo;! is an economist with Rand Corporation, cur-* 
r . •rennv working on energy problems. He did^r is under- 
graduate work at San Diego State ColV^e^'and com- 
pleted his, Ph. 14 at Harvard in economics. His doctoAil 
thesis was an assessment of strategies for^ treating vic- 
tims of heart attack He also analyzed several measures 
for \aluing the lives that might'be saved "bvfceiiergei^cy 



124 



interventions. 



er|c 



William Bieck is Principal Invesli|[ator on the Resportse 
Time Analysis S\udy, a five-year project funded 
through the National Institute of Law Enforcement and 
Criminal Justice, for the Kafrsas City, Missouri, Police 
Department. Prior to joining the KaAsasCity, Missouri, 
Police .Department, he was employed, by the Police 
Foundation as ati Observer on the Kansas Cit\ Preven- 
ti\e Patrol Studv. This experience enablec^^K to 

, monitor patrol' oj^eration^ first hand, havin^T^R>m- 
panted over 56 officers across all wafches for a period of 
14 mornhs Before his ^employment .with the Pohce 
Foundation.' he <^as an Assistant Professor of Sociology 
at NebrRka \\esle\an Universitv in Lincoln. Nebraska, 
and an Instructor in the Department of Caw Enforcc- 
nit nt and (.orrectuni for ijj^-t^niversitv of Nebraska, at 
Oniaha^ a posftiori he b^ld for seven vears. Mr Bieck 

^has a B S m P|\cholog\ and a M A. in So^iologx. 

Prc^fessor Robert F Boruch is Director of the 
Methodology and Evaluation Research Division. 
Ps\tholog\ -Departmejit. Northwestern L'niversitv. and 
current President of the Council for Applied Social Re- 
search He IS a coauthor of Social Expmmentqtion and an 
editor of Ex^enmental Te^ts of Public Policy, he has pub- 
lished o\cr ihirtv journal aVticles dealing with 
methodological, managerial, and ethical problems in re- 
search Dr Boruch is a member- of advisorv panels of 
the National Academy of Sciences, the American 
Psychological AsstKiationT and consults frequently fiyr 
Federal agencies on research planning arid design. ' 

Russell D Clark Hi is a social psychologist ana As- 
scKiate Professor at Florida State University. He did his 
undergraduate work at Tarkio College and his graduate 
work at the University of Kansas.. Aside from work' on 
attitude measurement he has studiedi'the influence of 
groups on decision making and the factors iftfluencing 
helping behavior 

^inda Victor Esrov's educational background is in ex- 
perimental psychology. She received her B. A. .from 
Temple University ^nd a'Ph.D. from Northwestern 
University. She also cpmpleted a two-year post-doctoral 
fellowship in evaluation research with Lee Sech'rest at 
Florida State University and has been involved in a 
number of evaluation proj^ts concerning; emergency 
medical services, ^ - 

Lieut. Colonel Lester H^rri^ had been a member of the 
Kansas City, Missouri Policj/Department for twenty-two 
(22) years and is currentl/assignefl as Assistant Chief oi 
Police. Past assignments include patrol, instructor of the 
Police Academy, Commander of Planning and Re- 
search, Commander of a patrol division. Commandcfr of 
an ilivcitigalions division. Assistant Commander.of both 
'the Administration Bureau and the Operation? bureau 



and Cf)mm^'der of the Services Bureau. He is a 196^ 
graduate of the Southern Police Institute and has at- 
tended Central Missouri 'State Univershy, majoring in 
Criminal Justice Administration. 

George Kelling received his B^ degfee from St. Olaf 
College^ his' Master's degree in sb<;ial work from the 
University of Wisconsin-Milwaukee, and his Ph.D. in so- 
cial •wbrk from the Univers?|y of Wisconsin^^Madison. 
Prvor tp beginning his work with the Police Foundation, 
where he is currendy employed, he was involved in pro- 
bation ai^d parole activities, and institutional work with 
aggressive youngsters. He was also an Assistant Profes- 
sor of Social Work at the University of Wisconsin- 
Milwaukee. Since j^tng the Police Foundation in i97I, 
Kelling has (worked on evaluation studies in Dallas 8c 
Kansas City, and is now involved in a large scale study 
of police foot patrol in several cities in New Jersey^ 

Joseph Lewis is Director of Evijuation %t the Police. 
* JotJfndation in Washington, D C. His first degree was 
^from the University of Maine in Electrical Engineering, 
but he received a subsequejit Master's degree in Eco- 
nomics and Business A(|ministration. After working as 
an engineer for Consolidated Edison and Ihe U.S. Navy, 
Lewis had a brief and successful career irJprivate indus- 
try before joining the Office of th^S^retar^y of De- 
fense From there he went 4o the Insiiiuie for Defense 
Analysis where he developed and directed Command 
and Co'ntrol Activities of the Weapons System Evalua- 
tion Group. In 1968 he joined the Urban Institute staff 
as Director of the Urban Governance Research Program 
and remained there until 1971 when he assumed his 
presertt position at the Police Foundation. 

Lee Sechrest deceived all three of his academic degrees 
from the Ohio State'^Iniversiiy, with a major in clinical 
psychology. He taught for two years at Pennsylvania 
State University before going to Northwestern Univer- 
sity, where he remained for fifteen year*. During his 
lenure at North w^estern, Sechrest became interested in 
program evaluation and in. health services, research and 
was, instrumental in developing training programs in 
both those areas. He moved to Florid^. State Univefsity 
in 1973, where he is Wofessor of Psychology. He is a 
past member of the Health Sj^ices Research Study Sec- • 
tion and is currently involve^m work on assessing per- 
formance of emergency medical technicians. 



130 



Curreot NCHSR Publications 

The following NationalCenter for Health Services 
Research publications are of interest to the health 
community. Copies are available on request to 
NCHSR, Office of Scientific and TechnicaHnfor- 
mation, 3700 East-West Highway^, Room 7-44, 
Hyattsville, Maryland 20782 (tel.: 301/436-8970). 

• Mail requests will be facilitated by enclosure of a 

• self-adhesive v^SKthf^ label. 

PB and HAP numbers ip parentheses are order 
numbers for the National Technical Information 
Service (NTIS), Springfield, Virginia 2216! (tel.: 
703/557-4650).^ Those publications which are out 
of stock are indicated' as available from NTIS. 
-...Prices may be obtained from the NTIS order desk 
on request. , ^ ^ ' 

Research DfgeMs 

Thf Research Digest Series provides overviews of 
significant research supported by NCHSR. The 
^ries describes either ongoing or complied proj- 
ects directed toward high priority health services 
problems. Issues are prepared by the jirincipal in- 
vestigators performing the research, in collabora- 
tion ulth^\CHSR staff. Digests are tnter^d for 
""an mterdisciplir>ary audience of health services 
planners, administr^itors, legislators, aqd others 

who make decisions on research applications. 
(HRA) 76^3144 Evaluation of a Med ical Information Svstfrm 
ma Community Hospital (PB 264 353) 

(HRA) 76-3145 'Computer-Stored Ambulatory Record (COS- 
TAR) (PB 268 342) 

(HH!A) 77-3160 Program Analysis of Physiaan Extender Al- 
gorithm Projects (PB 264 610) f 

(HRA) 77-3 161 Changes in the Costs of Treatment of Selected 
illnesses, 195 1-1964- 1 97 1 (HRP 0014598) ^ 

(HRA) 77-3163 Impact of State^Certificate-of-Need Laws 
Health CarcCosts and Util^Sailon (PB 264 352) 

"(HRA) 77-3164 An Evaluation of Physician Assistants in Drag- 
nostk Radiology (PB 266 507) ^ 

(HRA) 77-3166 Foreign Medical Graduates A Comparative 
Study of State Licensure Policies (PB 265 '235) 

(HRA) 77-3171 Analysis of Physician Price and Output Deci- 
sions 

(HRA) 77-3173 N urse Practitioner and Physician Assistant 
Training and Deployment *^ *» 

(HRA) 77-3177 Automation of the Problem-tiriented Medical 
R^ord 



K^or 



larch Sumrparies 

The Research Summary Serus provides rapid access 
to significant results of NCHSR-supported re- 
search projects. The series presents executive 
summaries prepared by the investigators at the 
completion of the* project. Specific findings are 
highlighted in a more concise form than in the 
final report. The Research Summary Series is in- 
tended for health services administrators, plan- 
ners, and mh^r research users Vvho require recent 
findings relevcnt to immc^l^atc problems in health 
services. 



(HRA) 77-3162 Recc^nt Studies in Health Services Research. 
Vol. I Ouly 1974 through December 1976) (PB 22^460) 

(HRA) 77-3176 Quality of Medical Care Assessment Using 
Outcome Measures 

Policy Research 

The Policy Research Series describes findings from 
the research program that have major significance 
for policy issues of thevmoment. These papers are 
prepaid by members/f the staff ofJMCHSR or by 
independent investigators. The series is intended 
specifically to inform those W the public and pri- 
, vate sectors who rtiust consider, design, and im- 
plement policies, affecting the delivery of health 
services. 

(HRA) 77-3182 Controlling the Cost of Health Care (PB 266 

885) 

Research Reports 

Tne Research Report Series provides significant re- 
y^earch reports in their entirety upon the comple- 
tion of the project. Research Reports ar.e de- 
veloped by the principal investigators who con- 
ducted the research, and 'are directed to selected 
users of health services research as part of a con- 
tinuing NCHSR effort to expedite the dissemina- 
tion of new knowledge" resulting from its project 
support. 

THRA) 76-^143 Computcf-Based Patient Monitonne System 
(PB 266 508) 



Mi 



Malpractice 



(HRA) 77-5152 How Lawyers Handle Medic 
Cases (HRP 0014313) 

(HRA) 77-3159 An Analysis of the Southern California Anbl- 
' ration Project, January 1966 through june 1975 (HRP 
flO 12466) 

(HRA) 77-3165 Statutory Provisions for bmdinff Arburation of 
Medical .Malpractice Cases (PB 264 409) 

(HRA) 77-3184 1960 and 1970 Spanish Heritage Population 
of the Southwest b\ County 

(HRA) 77-3 188^ Demon strailon and ETvaluation 'of a Total 
Hospital Infonnation System 

9 

(HRA) 77-SI89 Drug Coverage under National Health Insur- 
ance The Policy Options 

(HRA) 77-3191 Diffusion of Technolo^cal In'novation in 
. Hospit^ils: A Case Study of NuHear Medicine (in prepara- 
tion) 

Research Management 

The Research Management Series describes pro- 
grammatic rather than technical aspects of the 
NCHSR research effbrt. Information is presented 
on the NCHSR goals, research objectives, and 
priorities; in additiorr, thi; series contains lists of 
grants and contracts, and administrative informa- 
tion on funding.* Publications in this series are in- 
tended to bring basic inforfnation on NCHSR and 
its programs to research planners, administrators, 
and others who are involved with the allocation of 
research resources. 

(HRA) 76-3136 The Program tn Health Services Research 
(Revised 9/76) 

(HRA) 77-3158 Summary of Grants and Contracts, Active 
June 5(>,J976 



125 



ERIC 



131 



(HRA) 77-3167 Emergency Medial Services Systems Research 
Projects (Active as of June 30, 1976) (PB 264 407, ayailable 
NtlS only) 

(HRA) 77-^ 1 79 Research on the Priority Issues of thf Njational 
Center for Health Services Research, Grants and Contracts 
Active on June 30, 1976 * ^ 

(HRA) 77-3183 Recent Studies in Health Services Research, 
Vol. JI (CY 1976) ' ' % 

ReM'arch. Proceedings 

TTie Research Proceedings Series extends the avail- 
ability of new research announced at key confer- 
ences', symposia and seminars sponsorpd or^sup- 
ported by NCHSR. Inaddition to papers pre- 
sented, publications in this series include discus-- 
sions and responses whenever possible. The series 
IS intended to help meet the information needs of 
health services providers and others who require 
direct access ^o concepts and ideas evolving from 
the exchange of research results. 
(HRA) 77-3138 Women and Their Health* Research Implica- 
tions for a New Era (PB 264 359, available NTiS only) 

(HRA) 77-3150 Intermonnuin Medjcal Malpractice (PB 268 
344, available NTIS oi^) 

(HRA) 77-3154 Advances in HealtlLjurv^y Research Methods 

(^RA) 77-3181 NCHSR Research Conference Report on Con* 
sumer Sclf-Care in Health » 

(HRA) 77-3186 International Conference on Drog^and Phar- 
maceutical Services Reimbursement 



132 



BIBLIOGRAPHIC DATA 
SHEET ^ 



1 Report No. ^ 



NCHSR 78-46 



2. 



4. 1 "fit' Jnt^ Nubt it!e 

^"EMERGENCY MEDICAL SERVICES: RESEARCH >fETHODOLOGY ; CONFERENCE 
HEId in ATLANTA, GEORGIA', SEPTE^ER 8-10, 1976; NCHSR Research 
Proceedings 'Series - y 



3. Recipient's Accry^ion No. 



5. Report l)jtc 

' December 197 



6. 



7. Authort s 1 

Lee Sechrest (conference director/editor) 



8. Hertormin>; Or^anj/:4non Rtrfyt. 
No. 



9. Perfcifmtn/j Of^jniyjtion Name and Address 

Jacksonvflle Experimental- Health Delivery System, Inc< 
1045 Riverside Avenue (Suite 275) • 
Jacksonville, Florida 32204 



10. Proicct/Task 'M'ork Unit No. 



11. Coniraci/Grani No, 

HSM 110-72-31A 



12. "^ponsorin^ Orgjni/ation Name and Address 

DHEW, ^HS,*OASH, National Center for Health Services Research 
370^ 5alst-West Highway, Room 7-44 (STI) 
Hyattsville, Maryland 20782 

(Tel,; 301/436-8970) - ^ 



13. Type of Report & H^rtod 
( ovcrc d Proceedings 

Sept> b-lO, 1976 



14. 



15. "supplement J rv Notes 

DHEW Pub. No. (PHS) 78-3195. 



16. ^'^sTlCtS 

The focus of this conference was the importance^ of systematic research in evaluating 
tVie Emergency Medical Services system and administrative functions. Presentations 
made at the conference and compiled in this document deal with a range of conceptual 
and methodologic tissues.' Particular attention is given to the opposing yet mutually 
dependent roles of the admin istrator/eva-luator. Several papers presenting aspects 
of research conducted in a police setting offer an instructive analogy to emergency 
medical services systems. • 



4^ 



17 a. 



L5. Supplementary Notes (Continued) « 

NCHSR publication of research findings does not necessarily represent approval or . 
official endorsement<of research findings by the National Center for Health Services 
Research or^the Department of Health, Education, and Welfar^. 



.i7br^' 'I'l \' 1* I 

'Health services research 
•Emergency medical services 
Research methodology 



17c 



Releasable to the public. Available from National 
Technical Information Service, Springfield VA 
(Tel.; 703/557-4650) * ^ 22;6i 



19. c uritv < Ia\ . .Thi . 
H ( pof f 
^ LN(.LAssH liJ) 


21 . \o, ot Pages 

Est. 125 


20. -^et uftt y ' ! J ss ^ I h 1 •> 

\ \c I If n 


22. Price 



rO«M S ' S i-i RE , 



i MX»R>I \s\ \\ [ AND I \i s( () 



USCOMM-DC PX4 



